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1. Introduction 

Principal Component Analysis (PCA) is arguably the most common tool in high dimen- 
sional data analysis. It approximates a given data set by a lower-dimensional subspace 
obtained from solving an I2 optimization problem. While such an I2 minimization can 
be easily implemented to run fast for moderate-size data, it is not robust to outliers. 
That is, the estimated subspace can significantly change when adding points sampled 
from a very different distribution. This obstacle motivated the developments of many 
algorithms for robust PCA, where some of them are based on li minimization. Their 
robustness is often theoretically guaranteed when restricting both the distribution and 
fraction of outliers. 
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Here, we study the robustness to outliers of a "geometric li minimization" for sub- 
space recovery. In fact, we discuss the robustness of the following geometric Ip min- 
imization for all p > 0; For a data set X C MP , it tries to minimize among all d- 
dimensional subspaces, L, the quantity: 

ezjA',i) = ^dist(x,Lf, (1) 

where dist(x, L) denotes the Euclidean distance between a data point x and the sub- 
space L. In this paper, we restrict this minimization to d-dimensional Unear subspaces, 
which we refer to as d-subspaces. 

The geometric li minimization is related to some of the recent attempts for robust 
PCA [36, 37, 23, 40, 18]. However, it is hard to implement it directly since it is not 
convex (the set of d-subspaces, over which the li energy is minimized, is not con- 
vex). Nevertheless, the question of its robustness is fundamentally interesting. While 
the analysis in [19] implies such robustness when restricting the fraction of outliers, 
here we ask a more challenging question for the recovery of a single subspace: Can 
it be recovered by a sufficiently large sample, when having no restriction on the frac- 
tion of outliers, but on their distribution. One possible instance is when the outliers are 
spherically symmetric (i.e., invariant to rotations). We make the problem even more 
interesting by assuming points sampled from several multiple subspaces as well as 
spherically symmetric outliers and we study the recovery of the most significant sub- 
space by geometric li (or Ip) minimization (i.e., the subspace with the largest number 
of outliers). 

1.1. Background and Related Work 

The li norm has been widely used to form robust statistics [16, 21, 26]. The early prin- 
ciple of least absolute deviations for robust regression minimizes the sum of absolute 
values of residuals. For example, in linear regression it minimizes the sum of the ab- 
solute values of the deviations of the dependent variable observations from the fitted 
linear estimator based on the independent variable observations. It is a natural robust 
alternative for the least squares regression and actually emerged independently of least 
squares regression (see e.g., historical review in [14, 15, 8]). 

Osborne and Watson [25] suggested the use of the sum of absolute values of resid- 
uals in total regression problems, where observational errors of both dependent and 
independent variables are taken into account. This is a robust alternative for the to- 
tal least squares problem, which can be described geometrically as finding a subspace 
minimizing the sum of squares of orthogonal errors [17]. Addressing this geometric as- 
pect, Spath and Watson [27] as well as Nyquist [24] suggested li orthogonal regression 
for fitting a hyperplane by minimizing the sum of orthogonal distances (see also [1]). 
Wastson [33, 34] even suggested an orthogonal procedure for fitting a surface to 
data. Ding et al. [7] focused on fitting a linear subspace minimizing (1) and viewed it 
as robust PCA, which is invariant to rotations (in fact, the minimization of (1) was pro- 
posed even earlier by David and Semmes [6] for p> 1 in a pure analytic setting free of 



G. Lerman and T. ZhangAp-Recovery of the Most Significant Subspace 



3 



outliers). Zhang et al. [41] have formulated an online procedure for this minimization, 
which even fits to approximating data by multiple subspaces. 

Recently, several convex algorithms for robust PCA (with provable exact recovery) 
have been suggested [5, 2, 36, 37, 23, 40, 18]. In [36, 37, 23, 40, 18] the problem of 
fitting a subspace to data is translated into fitting a low-rank matrix to a given matrix, 
whose columns represent the data points, where outliers correspond to grossly cor- 
rupted columns. Both [40] and [18] propose a convex relaxation of the minimization 
in (1). We also view one of the terms in the energy of [36, 37, 23] (namely, the sum 
of I2 norms of column vectors) as an analogue of the energy (1) when the columns 
of the corresponding matrix for this term are the orthogonal complement of the data 
points with respect to the subspace. In the case of spherically symmetric outliers with 
no restriction of their fraction, it is currently unknown if exact recovery is guaranteed 
for any of the algorithms in [36, 37, 23, 40, 18]. On the other hand, we show here that 
such guarantees exist for geometric li minimization. To make the problem even more 
challenging, we find it interesting to ask about the geometric h recovery of a subspace 
containing the largest amount of points (i.e., the global subspace) among multiple 
subspaces within spherically-symmetric outliers. We see this question as a geometric 
generalization of basis pursuit, where li minimization can be used to solve an Zq mini- 
mization under some conditions [3, 10, 9, 4]. 

The setting of searching for the best Iq subspace among multiple subspaces is re- 
lated to the problem of sequential recovery of multiple subspaces buried in outliers, 
or in short, sequential Hybrid Linear Modeling (HLM). That is, recovering the most 
significant subspace among those subspaces, then removing the points along it (or in a 
strip around it) from the given data and repeating this procedure according to the given 
number of subspaces. A sequential HLM algorithm was suggested by Yang et al. [39] 
using the Random Sample Consensus (RANSAC) [12] heuristic to find a single sub- 
space iteratively. This RANSAC strategy repeatedly applies the following two steps: 
1. randomly select a set of d independent vectors; 2. count the number of data points 
within a strip of width e around the d-subspace spanned by those d vectors (both e and 
the number of iterations of these two steps are parameters set by the user). The final 
output of this algorithm is the d-subspace maximizing the quantity computed in step 2. 

Torr and Zisserman [30, 31] suggested a RANSAC-type strategy which selects a 
subspace (among the random set of candidates) by minimizing a variant of the I2 dis- 
tance from a subspace. This variant uses the square function until a fixed threshold and 
a constant function for larger values. 

1.2. Basic Conventions and Notation 

We denote by G{D^ d) the Grassmannian space, i.e., the set of all d-subspaces of MP 
with a manifold structure. The geodesic distance between F and G in G{D, d) is 



where {9i]f^i are the principal angles between F and G (we review these angles and 
their relation to geodesies in §3.2.1). Following §3.9 of [22], we denote by ^04 the 
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"uniform distribution on G{D,d)". We designate a ball in G{D,d) by BQ{L,r) as 
opposed to a Euclidean ball in R^, B(x, r). We refer to any of the global minimizers 
of (1) among L £ G{D, d) as a global Ip subspace. Similarly, local minimizers of (1) 
among L E G{D, d) are local Ip subspaces. 

By saying "with overwhelming probability", or in short "w.o.p.", we mean that the 
underlying probability is at least 1 — Ce^^/*-^, where N is the size of the data set X 
and C is a constant independent of N . We will also use "w.p." as a shorthand for "with 
probabihty". 

7.5. Setting of This Paper 

We assume an i.i.d. data set X C MP of size N sampled from a mixture distribution ji^ 
representing multiple d-subspaces with outliers and noise level e > 0. In the noiseless 
case (e — 0), the K + 1 components of the mixture measure are {/^i}|£o- The distribu- 
tion /Lto, which represents outliers, is spherically symmetric on (i.e., invariant to ro- 
tations of R^) and are supported on distinct d-subspaces respectively 
and are spherically symmetric within these subspaces (i.e., invariant to rotations within 
these subspaces). We further assume that {/XzlfLo have bounded supports. Moreover, 
they have nontrivial support (i.e., /ii({0}) < 1, i = 0, . . . , K). 

For the noisy case, we assume noise distributions {fi cjf^j with bounded support 
in the orthogonal complement of L*, and for technical reasons we assume that its pth 
moments are smaller than for all p < 1 (when considering geometric Ip minimization 
with p > 1 we only need this condition with p = 1 and when considering geometric Ip 
minimization with p < 1 we only need this condition with the relevant value of p). For 
consistency, if e = 0, then {i^i.ojfli ^'"^ '^^e Dirac 5 distributions within the orthogonal 
complement of respectively. 

For any noise level e > 0, the mixture distribution /i^ has the form 

K 

He = aoflQ + ^ Q!i/Xj X I^i^e, (3) 
i=l 

where ao > 0, ai > V 1 < i < K and X^iLo = 1. If e = 0, then for convenience 
we replace the notation p^ by fi, i.e., 

K 

fi^^a^fii. (4) 

4=0 

We also scale the support of {/i,;}f£o '■h^t 

supp(Ai)cB(0,l). (5) 

The impact of radii of inliers and outliers will only be exemplified in Theorem 2.2. 

We refer to /i^ created according to this model as spherically symmetric HLM mea- 
sure with noise level e (sometimes we also add "w.rt. In part of our setting, 
the assumption of spherical symmetry of /io can be completely removed, while still 
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assuming that are the same and that (5) is satisfied. In this case we refer to 

as weakly spherically symmetric HLM measure with noise level e. 
Throughout this paper we assume the condition 

K 

ai>^ai (6) 

4=2 

and consequently say that L\ is the most significant subspace. For the noiseless case 
of e = 0, the most significant subspace coincides with the global subspace (i.e., the 
subspace containing the largest number of points) w.o.p. For the noisy case of e > 0, 
we view condition (6) as a generalized notion of global Iq subspace w.o.p., that is, 
having the highest fraction of points "around" that subspace w.o.p. 

1.4. Mathematical Problems of This Paper 

We address here two mathematical problems. The simpler one is implicit in this intro- 
duction, though clear from the proofs. It asks whether the most significant subspace L\ 
can be recovered when e = by minimizing Ep(dist^(x, i)) over all L £ G{D,d). 
The main problem can be formulated using the empirical distribution //at of i.i.d. sam- 
ple of size N from fi. It asks whether L| can be recovered (w.o.p.) by minimizing 
Ep„ (dist^(x, L)), which is equivalent to minimizing (1). In the noisy case, we extend 
these problems to near recovery. 

7.5. Main Theorems 

In the noiseless case and Q < p < 1, we can exactly recover the global Iq subspace by 
Ip minimization as follows. 

Theorem 1.1. If n is a spherically symmetric HLM measure on with K d-subspaces 
{i*}^]^ C G{D,d) and mixture coefficients {ailf^o ^citisfyii^g (6), X is a data set 
of N points identically and independently sampled from fi and < p < 1, then the 
probability that L\ is a global Ip subspace is at least 1 — C exp{—N/C'), where C is a 
constant depending on D, d, K, p, ao, ai, /io> Mi> andmm2<i<K{'Ai&iG{L\, L*)). 

The theorem guarantees exact recovery of L\ w.o.p. for any percentage of outliers 
ao < 1- However the probability of this event depends (through the constant C) on the 
model parameters. Due to the non-convexity of the underlying minimization, we are 
unable to specify the direct dependence of this probability on the model parameters, 
even for very special cases.' However, we can estimate the probability that L\ is a local 
minimum when K ~ 1. For example, it follows from Theorem 2.2 (which appears later 
in §2.2) that if p = 1, /ii is uniform on L'l n -6(0, Ri), is uniform on -6(0, Rq) and 
there are qN i.i.d. samples from /ii and {1 — q) ■ N i.i.d. samples from fiQ, then LI is 
a local li subspace with probability at least 

/ —q ]\[ \ / —q^ N LI?' \ 

'-^d'e.p (8.01.d2.'(d + 2)0"'^'''"P [8-{l-q)-d^'-{d+2)^-D-Rl) " 

'For example, the constants 73 and 74 in the proof of this theorem are very difficult to estimate. 
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In the noisy case, exact asymptotic recovery is not possible in general (as we explain 
in § 3.7), but we can extend the above formulation to near recovery. Our estimates are 
expressed in terms of a constant depending on /ii and its parameters (see §A.l for 
the definition of ^i). For uniform /ii we show in Appendix A. 2 that 

> ^ (lliiliMl) . (7, 

In this special case, we can replace in all estimates below by the RHS of (7) and 
obtain slightly weaker estimates, which are easier to interpret. 

Theorem 1.2. If e > Q, fjL^ is a spherically symmetric HLM measure on MP of noise 
level e with K d-subspaces {L*}^]^ C G{D,d) and mixture coefficients {ajji^o 
satisfying (6), X is a data set of N points sampled identically and independently from 
/i£ and < J3 < 1, then the global Ip subspace for p^ is in the ball 60(^1, /), where 

f = f{^,K,d,p,ao,ai,pi) ^ ^ 1^ 1 —, (8) 

(ao + 2 • ai - l)p • (1 - pi{{0}))^ ■ 2" 

w.p. at least 1 — C exjp{— N / C), where C is a constant depending on e, p, d, D, fii, 
ao, ai andmm2<i<K{distG{Ll,L*)). 

If K = 1, then the above statement extends to 1 < p < 00 with 



f = f{e,K,d,p,ai,ni) ^ — ^ ■ —. (9) 



•(l-/.i({0}))-2 



The estimates of Theorem 1.2 were formulated for the worst case scenario of /^iq. 
Therefore, (8) and (9) are independent of /io (under the assumption supp(/io) C B(0, 1), 
which follows from (5)). We also note that if / > -^3^, then all principle angles are 
at most tt/2 and Bq{LI, f) = G{D, d). The theorem is thus only interesting when e 
is sufficiently small, in particular when it satisfies the following bound, which ensures 
that / < 

{ (ao+2-ai-l)p-(l-Mi({0}))P if p < I- 
2*-Ci , ' " ' (10) 

aWi-m({0}))-2--- ifp>landA-=l. 

At last, we formulate the impossibility of Ip recovery when p > 1 and K > 1 and 
thus demonstrate a phase transition at p = 1 when K > 1. This result does not require 
po to be spherically symmetric. 

Theorem 1.3. Assume that {L*}fLi are K d-subspaces in MP, which are identi- 
cally and independently distributed according to ^D,d- For each e > and a random 
sample of {L*}fLi, let p^ be a weakly spherically symmetric HLM measure on MP 
(w.r.t. of noise level e and let X be a data set of N points sampled identi- 

cally and independently from p^. If K > 1 and p > 1, then for almost every {L*}^^ 
(w.r.t. "1% d), there exist positive constants 5q and kq, independent of N, such that for 
any < e < 60 the global Ip subspace of X is not in the ball Bq [L^, kq) with over- 
whelming probability. 
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Later in §3.6.5 we provide estimates for and kq, which are independent of e. 
They require some technical definitions, which we would rather avoid here. Instead, 
we exemplify them for the special case where A' = 2, d = 1, D = 2 and /ii and /i2 are 
uniform distributions on line segments centered on the origin and of length 2. Denoting 
by the angle between L\ and L2, the analysis in §3.6.5 implies the following lower 
bound for both kq and 5q in this special case: 



These lower bounds for 5^ and approach zero when a2 approaches zero or when 
9 approaches or 7r/2. We expect such a behavior since if a2 = 0, 6' = or 6* = 7r/2, 
then for any p > 1, L\ is the unique global Ip minimizer w.o.p. We also comment that 
these bounds are not sharp (in particular, their discontinuity at p = 2 is artificial). 

1.6. Relevance ofTheory 

As discussed in § 1. 1, the geometric li minimization is a prototype for other robust and 
convex PCA algorithms [36, 37, 23, 40, 18]. Without any control on the fraction of 
outliers, no guarantees are known for the exact recovery of the other algorithms. We 
thus find it interesting to analyze the robustness of the geometric li minimization to 
spherically symmetric outliers with no restriction of their fraction. It is also interesting 
for us to quantify the phase transition of exact recovery at p = 1. The analysis of the 
geometric li minimization of this paper has inspired the analysis of [40, 18] and is also 
directly used in [19]. 

We also note that Theorem 1.1 can be repetitively applied to justify sequential HLM 
using Ip minimization with < p < 1 in the setting of i.i.d. samples from a spherically 
symmetric HLM measure with no noise satisfying a.; > X]j=j+i fo'" all 1 < i < 
K — \. Furthermore, the proof of Theorem 1 . 1 can justify the use of a variant of the I2 
loss function in the RANSAC setting of recovering a single subspace in [30, 31] with 
i.i.d. samples from our model when K = 1. However, the proof of Theorem 1.3 (in 
particular, (99)) shows that when K > 1 the subspace obtained by the minimizer of 
this variant is different than the global Iq subspace w.p. 1. 

1.7. Additional Results and Sti-uctiire of the Paper 

Additional theory is reviewed in §2. In particular, §2.1 establishes some necessary and 
sufficient deterministic conditions for a d-subspace to be a local Ip minimizer for a 
given data set; §2.2 uses these conditions to show that if one samples iVo i.i.d. outliers 
from ^0 and Ni i.i.d. inliers from /ii and if No = o(iVj^), then the global subspace 
is a local li subspace (it also considers the effect of different radii of the supports of 
/xo and yUi). On the other hand, it shows that in a general setting of a single underlying 
subspace with outliers, the global Iq subspace is a local Ip subspace w.p. when p > 1 
and w.p. 1 when < p < 1; §2.3 demonstrates natural instances, distinct from the case 



( 




ifp > 2; 

• smP{9) ■ cos^ (9), if 1 < p < 2. 

(11) 
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of spherically symmetric outliers, where the global Iq subspace is neither a local Ip 
subspace (even for p = 1) nor global one (even for < p < 1). We separately include 
all mathematical details verifying the theory of this paper in §3, while leaving some 
auxiliary verifications to the appendix. At last, §4 concludes this paper and discusses 
some immediate extensions of its results. 



2. Additional Tlieory 

2.1. Combinatorial Conditions for Iq Subspaces Being Local Ip Subspaces 

2.1.1. Preliminary Notation 

We denote the orthogonal group of n x n matrices by 0{n) and the semigroup of 
n X n nonnegative diagonal matrices by S+(n). We designate the projection from 
onto the c?-subspace L by Pl and the corresponding orthogonal projection by P^. We 
represent them hy d x D and {D ~ d) x D matrices respectively. Only in few places 
in the text we use D x D matrix representations instead and thus denote them by Pl 

and instead (where PJ^Pl = Pl and P^^ P^ = Pl)- The nuclear norm of A is 
denoted by ||Aj|*. We define the scaled outlying "correlation" matrix Bl.x of a data 
set X and a d-subspace L as follows 

Bl.x= ^L(x)Pi-(x)^/dist(x,L). (12) 

xex\L 

That is, unlike the covariance matrix, which sums over all data points the rank one 
matrices xx^, ^l.x sums over all outlying data points (i.e., x G X not lying on 
L), the restriction of xx-^ to matrices with column space in L and row space in the 
orthogonal complement of L, while scaling this product by the distance of x to L, i.e., 
||P];}"(x)||, where throughout the paper j| • || denotes the Euclidean norm. 

We exemplify Bl^x for a typical counterexample of robust recovery, which we 
discuss later in §2.3. 

Example 1. Let 13 = 2, d = 1, x = {to cos{9o) , to sm{9o))'^ , where io > and 
< ^0 !i f =^ {{O'li 0)^1 ('^27 0)^, • • • , (a^Vj , 0)^, x}. That is, X is a set of 

Ni + 1 points, where Ni of them lie on the x-axis with magnitudes {|ai|}^\ and one 
of them has an angle Oq with the x-axis and magnitude to. We denote the x-axis by Lo 
and the line passing through the origin and x by L' . 
We note that 

Bl,,x^ Pi„(x)Pi-^(xfdist(x,Lo)-' 

dist((io cos^^o), to sm(6'o))^ , Lo) 
= to cos(6'o) to sin(6'o)/to sin(6'o) = to cos(6'o) (13) 
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and 

Bl'^x^ J2 ^L'(x)Pi-,(xfdist(x,L')^i 
xex\L' 

= J2 PL'{{a,,Of) Pt>{{a..,Off/dist{{a,,Of, L') 

2=1 

Ni Ni 

= ^a^cos(6'o)aisin(6'o)/|a^sin(6'o)| = cos(6'o) ^ |aj|- (14) 

j=l i=l 

2.1.2. Conditions for a Local Ip Minimizer 

We formulate conditions for an arbitrary d-subspace ii to be a local Ip subspace, while 
distinguishing between three cases: p = 1,Q < p < 1 and p > 1. 

Theorem 2.1. IfLi e G{D,d), Xi ^ {xj^^^ c Li, Xq = {yi}.£\ C \ Zj and 
X ~ XqU Xi, then a sufficient condition for Li to be a local li d-subspace is that for 
any V G 0(d) and C G S+(d); 

Ni 

5^||CVP£^(x,)|| >||CVB£^ ;,|U. (15) 

i=l 

Furthermore, a necessary condition is that for any V e 0{d) and C £ S+((i)." 

5]||CVP^^(x,)|| > IICVB^^^^II,. (16) 

Proposition 2.1. //Li e G(Z?,d), A"! = {xj^^\ c Xi, A-q = {yi}^\ C \ Li, 
Sp({x,;}^^j^) = Zi, X ~ XqU Xi and p < 1, f/ien Zi /i a local minimum ofei^ {X , L) 
among all L e G{D, d). 

Proposition 2.2. If Li e G{D,d), Xi = {xj^^\ c Li, Xo = {yi}^\ C \ Li, 
X = XqU Xi and p > 1, then a necessary condition for Li to be a local minimum of 
ei^{X , L) among all L G G(-D, d) is 

No 

^P£^(y.)P^^/y0^dist(y,,Zi)''-' = 0. (17) 

4=1 

This statement is also true when Xi = $ and < p < 1. 

The above conditions follow from differentiating the corresponding energy function 
(along geodesies) and using the resulting derivative to form necessary and sufficient 
conditions for local minimum (see their proof in §3.2). However, intuitively it is hard to 
explain their expressions without going through all calculations. Instead, we exemplify 
them for the special case presented in Example 1 . 
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Example 2. We follow the setting of Example 1. Let us first simplify (15) (or equiva- 
lently (16)j in this example. If Li = Lq (i.e., it is the x-axis), then the set of outliers is 
Xq = {x} and the inliers are Xi ~ X \ Xq. Since d = 1, V S 0((i) is either 1 or —1 
and C is a positive constant c. The LHS of (15) thus has the form 

Ni Ni 

^||CVP^^(x,)|l-c^|a,| 

i=l 1=1 

and computing B^^ ^ as in (13), the RHS has the form 

||CVB£^_^||, -ctocos(0o)- 
Therefore, a sufficient condition for Lq to be a local li line is 

^ \a,\ > tocos(6'o). 

i=l 

If Li = L' (i.e., it is the line passing through x and the origin), then Xi = {x} and 
Xq = X \ Xi. Applying (14) and following similar calculations as above we have that 
a sufficient condition for L' to be a local h line is 

cos(6io) ^ la^l < to- 

i=l 

If on the other hand Li does not pass through any point in X, then Xi — ^ and 
Xq ~ X. Therefore the LHS of (15) is and thus (15) never holds. 

All the above conditions are also necessary when their inequalities are not strict 
(see (16)j. 

We thus note that if 9q ~ 7r/2, then both Lq and L' are the only two local h lines 
(assuming the obvious conditions: to > and X]i=^i l*^*! V on the other hand 

< 00 < 1^1% then Lq is a local li line if^^2i l"j|/*o > cos(6'o) and L' is a local 
li line if'Y^f^i \o-i\/'tn < 1/ cos{9q) (we also recall that for necessary conditions we 
relax the strict inequalities). Therefore, for fixed Q < Oq < 7r/2 at least one of Lq or L' 
is a local l\ line and there are no other local minimizers. IftQ is sufficiently large, then 
L' is the global li line and iftQ is sufficiently small, then Lq is the global li line. 

Next, we note that Proposition 2.1 implies that both Lq and L' are local Ip lines 
when < p < I (as long as Ni ^ and one of the ai 's is not zero). 

At last, we explore the sufficient condition of Proposition 2.2. If Li = Lq, then the 
LHS of (17) is to cos(0o)(sin(0o))^~^- Therefore, (17) holds in this case only when 
^0 = 7''/2- Similarly, if Li = L' , then (17) holds when 9q = it/2. If Li does not 
contain any point of X and has an angle < < 'k/2 with the x-axis, then (17) holds 
when 

Ni 

cos6'(sin6')P^i ^ \a,\ + to 008(6* - 6lo)(sin(6' - 60))^-^ = 0. 

i=l 
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2.2. Local Ip Subspaces for Probabilistic Settings with a Single Subspace 

We exemplify how to use the conditions of §2.1.2 in a probabilistic setting with a sin- 
gle underlying subspace. We first assume inliers sampled from a boundedly supported 
spherically symmetric measure within a d-subspace LI and outliers sampled from a 
boundedly supported spherically symmetric distribution on M^. In some cases we even 
significantly relax the assumptions on /ig and i^ii. 

For any p > 0, we determine whether LI , which is the global Iq subspace w.o.p., is 
also a local Ip subspace w.o.p. Our proofs appear in §3.3. 

We first claim that for p = 1 the global Iq subspace is a local Ip subspace w.o.p. as 
long as the fraction of inliers is sufficiently large. 

Theorem 2.2. If L'l e G{D, d), Rq, i?i > and X is a data set in of Nq + Ni 
points, where Nq of them are identically and independently sampled from a spherically 
symmetric distribution on B(0, Rq) and Ni of them are identically and independently 
sampled from a spherically symmetric distribution fii on L\ H B(0, Ri) with nontrivial 
support, where Rq, Ri > 0; Then L\ is a local li subspace of X w.p. at least 

^-'^'^"P (2^^) -'^^^"P ( 2 -'d^-'D- Rl ) ' vv/..r.,,+ ^e<<5.(^i), 

(18) 

where (/ii) is a constant depending only on fii. 

In particular, if Nq ~ o{Ni), then LI is a local li subspace of X w.p. at least 

'--^-(-^)- — (-^^^)^ 

In Appendix A. 3 we establish the following expression for the constant in 
the special case where is the uniform distribution on n B(0, 1): 

6,{^il)^Rl/{d + 2). (20) 

In this special case, we note that when RiNi <C Rq^/I^, then the lower bound for 
the probability in (19) is actually negative and thus meaningless. In the case of global 
Ip recovery of the global Iq subspace (as in Theorem 1.1), we are unable to formulate 
precise probabilistic bounds, but it is important to keep in mind that in all of our prob- 
abilistic estimates the required percentage of inliers increases as the ratio of radii of 
outliers per inliers increases. 

The following proposition shows that for p > 1 and a rather general setting with a 
single underlying subspace the global Iq subspace is a local Ip subspace w.p. 0. 

Proposition 2.3. Assume that D > d + 1, L^ G G{D, d), /io is a distribution on 'EP 
such that /xo({iv}) 7^ 0/or any affine subspace L, where L C M^, /ii a distribution 
on L\ and fi = olq^q + ai/ii, where aQ, ai are nonnegative numbers summing to 1. 
If X is a data set sampled identically and independently from p and p > 1, then the 
probability that L\ is a local Ip subspace of X is 0. 

If on the other hand, < p < 1 and X is generated in the same way as in Proposi- 
tion 2.3, then Proposition 2.1 implies that w.o.p. is a local Ip subspace. 
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The phase transition phenomenon demonstrated above dXp ~ 1 is rather artificial 
in the current setting. Indeed, this phase transition is based on the fact that (17) holds 
w.p. for p > 1 and any finite sample. However, if /io is symmetric with respect to 
L*i, then the expectation of the LHS of (17) is zero and thus the LHS of (17) divided 
by A^o approaches w.p. 1 as iVo approaches infinity. Therefore, when p > 1 the 
distance between the global subspace and any local Ip subspace approaches as 
N approaches infinity. Moreover, Theorem 1.2 shows that this formal phase transition 
also breaks down with noise. Nevertheless, Theorems 1.1 and 1.3 indicate that there is 
a clear phase transition for a spherically symmetric HLM model with K > 1. Indeed, 
in this case /i — aoMo (i-e., c^if^i) is not symmetric w.r.t. (except for few 

negligible cases of {L*}fL2)- 

2.3. Counterexamples for Robustness of Best Ip Subspaces 

We discuss here basic situations, where global Ip d-subspaces are not robust to out- 
liers for all < p < oo. More precisely, we show how a single outlier can com- 
pletely change the underlying subspace. These cases differ from our underlying model 
of spherically symmetric outliers. We describe below a probabilistic setting to sample 
the data, but we only care about a single counterexample sampled this way. We thus 
do not bother about statements in high probability (even though they are correct), but a 
positive statement for at least one of the sampled data sets. 

A typical example includes iVi points sampled identically and independently from 
a uniform distribution on B(0, e)r\L* C M^, where L* is a d-subspace of R^, and an 
additional outlier located on a unit vector orthogonal to L* . By choosing e sufficiently 
small, e.g., e < iVj^ , the global Ip subspace passes through the single outlier and is 
orthogonal to the initial d-subspace for all p > 0, which is the global Zo d-subspace. 

If p = 1, then the global Iq d-subspace in this example is still a local l\ subspace 
(as explained in Example 2 for the special cases d = 1 and D = 2). Nevertheless, if 
the outlier is located instead on a unit vector having elevation angle with the original d- 
subspace less than 7r/2, then e can be chosen so that the global subspace is even not 
a local l\ subspace (see again Example 2). However, if < p < 1, then Proposition 2. 1 
implies that the global Zo subspace is still a local Ip subspace in both examples. 

Similarly, it is not hard to produce an example of data points on the unit sphere 
of where the global Iq subspace is still not a global Ip subspace for all p > 
(normalization of data to the unit sphere is a common practice for robust PCA algo- 
rithms [20, 18] as well as HLM algorithms [38, 41]). For simplicity we give a coun- 
terexample for d = 2 by letting A^i data points be uniformly sampled along an arc of 
length e of a great circle of the sphere . We then place an outlier on another great cir- 
cle in the location furthest from the intersection of the two great circles. For any fixed 
p > and e sufficiently small, we note that the global Ip subspace passes through the 
outlier and the center of the arc of length e and is orthogonal to the subspace containing 
this arc. Clearly, it is different than the global /q subspace. 
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3. Verification of Ttieory 

We describe here the proofs of the theorems and propositions of this paper according 
to the following order of sections; §2.1, §2.2 and §1. 

3.1. Preliminaries 

3.1.1. Basic Notation and Conventions 

We denote the Frobenius dot product and norm by (A, B) p and || A||i?, that is, (A, B) i? = 
tr(A^B) and ||Aj|i. = ^{A,A)f. The n X n identity matrix is written as I„. We 
denote the subset of S-|_(7i) with Frobenius norm 1 by NS+(n). If m > n we let 
0(TO,n) = {X e R""^" : X^X = I„}, whereas if n > m, 0(m,n) = {X G 

We sometimes apply the energy (1) to a single point x, while using the notation: 

ei^(x,i) = eip({x},i). 

3.1.2. Auxiliary Lemmata 

We formulate several technical lemmata, which will be proved in Appendices A.4-A.6. 

Lemma 3.1. If Li, Li G G{D, d), p > 0, fii is a spherically symmetric measure on 
Li with bounded and nontrivial support and distG{Li, Li) > e, then 



Lemma 3.2. For any x G and Li,L2 G G{D, d): 

|dist(x,ii) - dist(x, 2.2)1 < |lx|| distcl^i, ^2)- 

Lemma 3.3. If Li,L2 G G{D,d), /ii and /i2 are distributions supported within Li 
and L2 respectively and created by an appropriate rotation of the same distribution, 
which is spherically symmetric within a d-subspace and has a bounded and nontrivial 
support and p < 1, then for any L G G{D,d): 



3.2. Proofs for the Theory of%2.1: Combinatorial Conditions via Calculus on the 
Grassmannian 

3.2.1. Preliminaries: Principal Angles, Principal Vectors, Representation of the 
Grassmannian and Geodesies on the Grassmannian 

We frequently use here principal angles and for completeness we present one of their 
equivalent definitions (§12.4.3 of [13] provides additional background on principal an- 
gles). For two d-subspaces F and G with corresponding orthonormal bases stored as 




E,,, (dist(xi, L)P) + E^, (dist(x2, L)^') 
>E,,, (dist(xi, L,)P) + E^,(dist(x2, L»)f ) for i = 1, 2. 



(21) 
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columns of the matrices Q^?, Qc G M^^"* respectively, the principal angles 7r/2 > 
6*1 > 6*2 > • • • > > 0, are obtained by 

0j = arccos(CTd_i(QQQF)), i = l,...,d, (22) 

where (Jd-ii^^^p) is the (c? — i)th singular value of the matrix QqQ_f. We remark 
that we order the principal angles decreasingly, unlike the common agreement [13] 
(§12.4.3), where (Jd-i in (22) is replaced by cr,;. 

We denote by fc = k{F, G) the largest number such that Ok ^ 0, so that 9i> . . .> 
Ok > Ok+i = . . . = Od = 0. We refer to this number as the interaction dimension 
and reserve the index k for denoting it (the subspaces F and G will be clear from 
the context). We recall that the principal vectors {v;}^^]^ and {v'^jf^j^ of F and G 
respectively are two orthogonal bases for F and G satisfying 

(vj, v^) = cos(6',), fori = 1, . . . 

and 

V, _Lv^-, forall 1 < i 7^ j < fc. 

We define the complementary orthogonal system {ui}f^^ for G with respect to F 
by the formula: 

v.; = cos{0,)vi + sin{Oi)ui, i = 1, 2, • • • , fc, ^^^^ 
Ui = Vi, i = fc + 1, • • • , d. 

Clearly, 

Ui _L Vj for all 1 < i, j < k . 
We note that + G can be decomposed using these principal vectors as follows: 

F + G = Sp(vi, ui) Sp(v2, U2) • • • Sp(vfe, Ufe) 0(F n G), 

where ^ denotes an orthogonal sum (i.e., any two subspaces of the sum are orthog- 
onal). Therefore, the interaction between F and G can be described only within these 
2-dimensional subspaces Sp(vi, (equivalently, Sp(vi, v^)) via the principal angles. 
This idea is also motivated by purely geometric intuition in §2 of [35]. 

It follows from [35, Theorem 9] that if the largest principal angle between F and 
G is less than tt/2, then there is a unique geodesic line between them. Following [11, 
Theorem 2.3], we can parametrize this line from F to G by the following function 
L: [0,1]— >■ G{D, d), which is expressed in terms of the principal angles of ^ 

and G, the principal vectors {v^jf^j^ of F and the complementary orthogonal system 
of G with respect to F: 

Lit) = Sp{{cositO,)v, + sin(t0,)uJti)- (24) 

The length of this geodesic line is clearly expressed by the distance distc of (2). We 
remark that (24) only holds when equipping the Grassmannian with this distance. 
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3.2.2. Proof of Theorem 2.1 

In order to establish quantitative conditions for Li to be a local minimum of ei^ (X, L) 
among all rf-subspaces in G(D, d), we arbitrarily fix a d-subspace L G Bq(Zi, 1) and 
check the sign of the derivative of the li energy when restricted to the geodesic line 
from Li to L. If this derivative is positive then Li is a local li subspace. Similarly, if 
Li is a local li subspace then this derivative is nonnegative. 

The restriction of L to 60(^1, 1) implies that 9i <1 and thus by [35, Theorem 9] 
this geodesic line (connecting Li and L) is unique. We parametrize it by the function 
L: [0,1]— )■ G{D,d) of (24), where here {OiYl^i '^^e principal angles between Li 
and L, are the principal vectors of Li and {ujf^j^ are the complementary 

orthogonal system for L with respect to ii. The necessary and sufficient conditions for 
Li to be a local li subspace will be formulated in terms of the sign of the derivative of 
ei,{X,L{t)): [0,l]^Ratt = 0. 

We follow by simplifying the expression for the function ej^ {X, L{t)) and its deriva- 
tive according to t. We denote the projection from onto Sp(vj , Uj), where 1 < .7 < 
d, by Pj and the projection from MP onto (Li + L)^ by and use this notation to 
express the following components of the function ej^ (Aq, for i — 1, . . . (we 
later express the components of ej^ {Xi , L{t))): 



dist(y,,i(t)) 



dist2(P, (y,), L{t)) + dist2(P^(y,), L{t)) 



^((-sin(t0,)vj +cos(t(?,)u,) •yOVdist2(P^(y,),i(i)). (25) 

\ 3=1 

We differentiate the expression for dist(yi, L(<)) in (25) for all 1 < z < A^o as 
follows (note that we use the fact that dist^(P^(yi), L{t)) is independent of t): 



— (dist(y„i(t))) 

ELi ((cos(t6ij)vj + am{tej)uj) ■ y,) ((- sm{t0j)vj + cos(te'j)uj) • y,) 



dist(y,,L(t)) 



(26) 



At f = it becomes 
d 



dt 



(dist(y„i(i))) 



dist(y„i(0)) 



(27) 



We form the following matrices: C = diag(6'i, 6*2, • • • , 0^), V e 0{d, D) with j\h 
row vj and U G 0(d, D) with jth row uj. We then reformulate (27) using these 
matrices as follows: 



— (dist(y„L(t))) 



tr(CVy,yfU^ ^ 
dist(yi,Li) 



(28) 
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Similarly, we express the components of (A'g, L{t)) for all e Li, where i 



dist(xi,L(t)) 



^|(v,.x,)Psin2(t0,-) 



and differentiate these expressions as follows 



At 



(dist(x„i(t))) 



dist(x,,L(t)) 



At f = 0, these derivatives become 



— (dist(x„i(t))) 



(29) 



(30) 



Combining (28) and (30) and using 



No 



A := ^y,;yf/dist(y,,Li), 



we obtain the following expression for the derivative of the h energy of (1); 



— {ei,{X,L{t))) 



Ni 



t=0 



||CVx,|| - tr(CVAU^). 



(31) 



4=1 



Replacing V with V e 0{d), whose jth row is P^-^i'Vj)^ and U with U G 
Rdx{D-d)^ where = [Ui, Ua], Ui e 0{D - d, k), whose jth row is (uj)^, 
and U2 = 0{D-d)x{d~k), we may rewrite this expression as follows: 



— {ei,{X,Lm 



Ni 



= ^j|CVP^^x,||-tr(CVB^^_^U^ 

t=0 1=1 



We note that 



niax(tr(CVBi^^^U^)) = ||CVBi^ 



(32) 



(33) 



Indeed, denoting the thin SVD decomposition of CVBj^^ ^ by UoSoV^ we have 
that 



tr(CVB^^ ,:,U^ ) = tr(UoSoV^ ) = tr(SoV^ Uq) < tr(So 



ICVB 



Li,X\ 



(34) 



and equality is achieved in (34) when — VqUq . The theorem is thus concluded 
by combining (32) and (33). 

The theorem is now easily concluded. Indeed, if (15) is satisfied then it follows from 
(32) and (34) that the derivative of eii{X , L{t)) at t = is positive and thus Li is a 
local li subspace. If on the other hand Li is a local li subspace, then the derivative of 
ei-^{X, L{t)) at t = is nonnegative for any geodesic line. It thus follows from (32) 
and (33) that (16) is satisfied. 
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3.2.3. Simultaneous Proof for Both Propositions 2.1 and 2.2 



For the d-subspace Li and an arbitrary d-subspace L E Bg(-£'1,1), we form the 
geodesic Hne parametrization L{t) and the corresponding matrices C, V, U, V and 
U as in the proof of Theorem 2. 1. 

We assume first that p > 1 (and thus start with proving the main part of Proposi- 
tion 2.2). We note for z e 



-^dist(z,i(t))P =pdist(z,L(t))P-i-^dist(z,i(t)), 



(35) 



where if z = Xi, z = 1, 2, • • • , A^i, or z = y^, i = 1, 2, • • • , No, then the derivative in 
the RHS of (35) can be formulated using (26) or (29) respectively. Applying (27), (30), 
(35) and the fact that dist(xi, ii) = 0, for i = 1, 2, • • • , A^i, we obtain that 



dt 



No 



= -p^ dist(y„ tr(CVyiyf U^) (36) 

No 

= -pj] dist(y„ L^r~' tr(CVP^^ {yOP^^ (y.)^U^). 



If Li is a local minimum of e; {X, L), then the LHS of (36) is nonnegative. Fixing 
C = V = I(j in the RHS of (36) and using its nonnegativity and then applying (33), 
we conclude that 



No 



> maxp^dist(y„Zi)f-2tr(Pi^(y,)P^^(y.)^U^) 



No 



(37) 
(38) 



and consequendy that (17) holds. That is. Proposition 2.2 is proved whenp > 1. Propo- 
sition 2.2 can be similarly proved when Xi = % and < p < 1. Indeed, (36) still holds 
in this case {X = X^). 

Next, assume that p < 1. We note that the derivative of ei^{X ,L{t)) at i = is 
only defined when p > I (indeed, in view of (30) the Umit of the derivative in (35) 
when t ^ {] and z = Xj, z = 1, 2, • • • , A^i, is infinite). To overcome this, we use the 
following derivative according to the variable t^: 



dtp 

It follows from (26) that 



= lim 

t-i-O 



t^-P d 



[ei,{X,L{t))) 



(dist(y„L(t))f) = lim^^distCy^LW)" = 0. 
dtp t->o p dt 



(39) 



(40) 
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Furthermore, it follows from (35) and (30) (and also its derivation from (29)) that 
^(dist(x„L(t))f) - lim .p.dist(x„L(t))P-i') • ^dist(x„i(t)) (41) 



= flimdist(x,,i(t))/t)^ ^ ^dist(x„i(t)) 
Combining (40) and (41) we obtain that 

^j|CVP^^(x.)ir- (42) 



icvp£^(x,)r, 



t=0 



1=1 



Now, if Sp({xi}^\) = Li, then there exists 1 < j < A^i such that vf x^ ^ 
and thus |1CVP|^^ (xi)|| = ||CVxj|| > 6*1 ||vfx,;|| > 0. Combining this observation 
with (42) we conclude that Li is a local minimum of e; {X,L{t)) and thus prove 
Proposition 2. 1 . 

3.3. Proof of Theorem 2.2: Combination of Combinatorial Estimates (%3.2) with 
Probabilistic Estimates 

We assume for simplicity of notation that i?o = i?i = 1, though the correct scaling 
by i?o and Ri is obvious from the proof. To find the probability that L\ is a local li 
subspace we will estimate the probabilities of large LHS and small RHS of (15) for 
arbitrary L G 'Qq{L\, 1). We denote the iVi inliers and A^o outliers by {xi},^^j^ and 
{yi}f=i respectively. Due to the homogeneity of (15) in C, we will assume WLOG 
that ||C||2 = l,i.e., Oi = 1. 

We start with estimating the probability that the RHS of (15) is small. Applying the 
above assumption that ||C||2 = 1 we have that 



and consequently 



e 



Nq J - \ No y/d 

^ — — < ^ ^ Pm 



< 



We further estimate this probability by Hoeffding's inequality as follows: we view 
the matrix 'Bl^^x as the sum of random variables Pi* {yi)Pi* (yi)^ /\\Pl' (yOll' * = 
1, . . . , Nq. Since the distribution of outliers is spherically symmetric in B(0, 1), the 
coordinates of both PL«(yi) and -Pj^"* (yi)"^/||^'/* (yi)|| have expectations and take 
values in [-1,1]. We can thus apply Hoeffding's inequality to the sum defining Bli,x 
and consequently obtain that 



Pr - _^ > 1 - 2dDcxp ^ . (43) 
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Next, we estimate the probability that the LHS of (15) is sufficiently large. We first 
note that 

Ni Ni Ni 

^l|CVPi.(x.)|| >5]|0ivfPi.(x,O| =5^|vfFij(x,)| 

i—1 i—1 i—1 



> 



\ 



i=l \i=l / 



Second of all, since ni is spherically symmetric on i* n B(0, 1) 

£;,,,(PL.(x)Pi.(x)^) - <5Jrf, where 6, = 
We will prove in Appendix A. 7 the following statement; 



(44) 
(45) 



TV, 



then ^mm^ a, Pl^ {^,)Pli (x^)^^ > 6^ - (46) 

We combine (44)-(46) and Hoeffding's inequality to obtain the following probabilistic 
estimate for the LHS of (15): 

'E,=\l|cvPij(x,)|| 



Pr 



> - 77 



(47) 



Ni 

'T^1,Pl'M)Pl'M^V 



> Pr min cr, 

y<j<d 

> Pr max cr, 

\l<J<d 



> Pr 



> Pr I max 

l<p,l<d 



(5* Id 



> (5* - 77 



5J.d < 1] 



< V 



Ni 



l<p,l<c 



< 



>l-2(f exp - 



Nil]"' 



From (43) and (47), (15) is valid with probability at least 



1 - 2d^ cxp 



- 2dD cxp - 



Nn 



Ve, 7] s.t.rJ+-^e<64^il). (48) 

iVi 



We can choose e = iVi(54/ii)/(2iVo), V = <5*/V4.005 and obtain that if A^o = o{Nf) 
then (15) is valid with the probability specified in (19). 
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3.3.1. Proof of Proposition 2.3 

Let {yi}^J!i denote the i.i.d. outliers sampled from /ip- We will prove that for any 

V g M^xD-d. 

A^o (yi e : PLj(yi)Pi-.(yi)^dist(yi,i*)P-2 ^ V) = 0. (49) 

Proposition 2.3 follows by substituting V = - Y,^=2 PLi{yt)Pti (y»)^dist(yj, i^)^-^ 
in (49) and applying Proposition 2.2. 

We may assume that yi ^L\\J L\-^ since ijlo{{L\}) = ^o({-^i^}) = 0. We note 
that for any yi ^ L| U Ll^ the rank of Flj (yi)^iS (yi)^ is 1. Therefore, (49) is 
obvious if rank(V) ^ 1. Furthermore, if kcr(V) 7^ L\ then (49) is also obvious since 
the kernel of Pl» (yi)^Lj (yi)^ contains L\. 

At last, we assume that rank(V) = 1 and ker(V) D L\ and denote v = kcr(V)-'-. 
Applying the assumption that proper affine subspaces of have measure /^o zero and 
the assumption _D > d + 1, we obtain that /io(Sp(i|, v)) = 0. We thus conclude (49) 
(and consequently Proposition 2.3) as follows. 



3.4. Proof of Theorem 1.1: From Local Probabilistic Estimates to Global Ones 

3.4.1. Proof of the Special Case: K = 1 

Part I: LI is a Global Ip Subspace in 60(^1, 71) 

We assume here that there is only one underlying subspace, L\, since it is easier to 
follow our proof in this case. We prove in this part that there exists a constant 71 > 
such that w.o.p. L\ is the global Ip subspace in BG(ii,7i). We arbitrarily choose 
L S G(D, d) such that distG(i, L\) = 1 and parameterize a geodesic line from L\ 
to L by a function L: [0,1] ^ G{D,d), where L(0) = LI and = L. We then 
observe that there exists 71 > such that the function ei^ {X , L{t)): [0,1] — > R of (1) 
has a positive derivative w.o.p. at any t G [0, 71] (as explained in §3.2.3 and §3.4.1 we 
use the derivative with respect to the variable t^), that is. 



Mo (yi e R'' : PLt(yi)PL*(yi)^dist(yi,Lt) 




<^o (yi e R^ : PLi{yi) = cv for some c G R) 
= Mo (yi eR^ :yi e Sp(L*,v)) =0. 



_6_ / Exg;^dist(x,£(0> 
AtP \ N 



) 



> foralU e [0,7i] w.o.p. 



(50) 



We remark, that similarly to §3.2.3, we use the derivative with respect to the variable 
(which is clarified in (39)), since the derivative with respect to t of dist(x, L{t)Y at 
t = is undefined when p < 1 and x e ^"1 . 
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We will deduce (50) from the following two equations: 

d ^Exe;^dist(x,i(i)) 



N 



> 72 w.o.p. for some 72 > (51) 



t=o 



and 



dtp 



N 



Vto € [0, 71] w.o.p. 



^ / Exg;^dist(x,£(t))P 
dtp V N 



< 



t = to 



72 

2 ' 



(52) 



When p = 1, (51) practically follows from the proof of Theorem 2.2 by arbitrarily 
fixing e and rj such that eao /ai + ?7 + 72/ai < and noting that when sampling from 
the mixture measure specified in the current theorem (unlike Theorem 2.2) the ratio of 
sampled outliers to inliers, Nq/Ni, goes w.o.p. to ao/ai. When p < 1, (51) follows 
from (42). We also observe that 72 = 72(0^07 cti,d, 

We first verify (52) for the sum of elements in A"! = A" n In view of (29), for 
any x E Xi the single term in that sum (i.e., dist(x, L{t))P) has a bounded second 
derivative with respect to t; hence, we can find constants 71 and 72 satisfying 



d 
'dtp 



Exe;. dist(x,i(t))P 



TV 



i=0 



d 

dtp 



E.ex, dist(x,L(t))P 
N 



< 



t=ta 



72 



(53) 



Vto e [0,71]. 



We derive a similar estimate by replacing the summation of x G Ai by the sum- 
mation of X G X \ Xi. Using the constant 73, which we clarify below, we separate 
the latter sum into two components: X := {x E X \ Xi : dist(x, i|) < 273} and 
X\{XiU X). 

In order to deal with the first sum, we define 



74 



^(x : < dist(x, < 273) 



and note that we can choose 73 = 73(£), 72, /io) = 73(1?, d, ao, ai, //q, suffi- 
ciently small such that 74 = 74(6?, ao, ai, /io) is arbitrarily small. We use 74 to bound 
the ratio of sampled points from X and X as follows: 



< 274 w.o.p. 



(54) 



1 ^{x.) takes values in [0, 1], therefore by applying Hoeffding's inequality to ^ ^ 



E{I^{x)) = ^(x : X e A") = 74 and 

/v(x). 



Indeed, we note that #(A) = ExeAr ^x(^ 
I 

where x e A, we conclude (54). 

Now for Yi E A, the derivatives expressed in (26) and (42) are bounded by 1 since 
the support of fiQ is contained in B(0, 1). Thus, by combining this observation with (54) 
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we obtain that there exist 73 and 74 such that for any to G [0, 71]: w.o.p. 



d / Exg;edist(x,£(t))P 



_±_ / Exg;edist(x,£(t))P 
dtp V N 



< 



72 



(55) 



Differentiating (26) and (42) one more time, we obtain that for x e A" \ [Xi U X), 
the second derivative of dist(x, Lit)) with respect to t^ is bounded by C{d)/^'^. Thus 
we can choose 71 = 71(72, 73, d) = 71 (ao, ai, /io, /ii, d, i'jp) sufficiently small such 
that for any G [0, 71]: 



d E: 



xeAr\(A'iUA') 



dist(x, L{t))P 



dtp 



N 



d E 



dist(x,i(t))P 



t=0 



iV 



< 



72 
6 ' 



(56) 

Equation (52) and consequently (50) are thus verified by combining (53), (55) and (56). 
That is, LI is the global Ip subspace in BgC^i ' 7i) ^'^^ sufficiently small 71. 



Part H: L\ is a Global Ip Subspace in G{D, d) 

We will first show that for all L <E G{D, d) \ BG(i| , 71) and any fixed p < 1, there 
exists some 75 > such that 

ei^{X,L)-ei^{X,Ll)>j5N, w.o.p. (57) 

We can prove (57) in expectation as follows: 

{ei^{K,L))-E^ (%(x,iD) = "0 {E^„ (e,^ (x, i)) - £;^„ {eiM,Ll))) 

(58) 

+ "1 (-E^Mi (e/p(x,i)) --E^i {ei^{x,Ll))) = aiE^, (%(x,i:)) 
^ ai(l-Mi({0}))2P'Wf 

The second equality of (58) follows from E^^ (^ei^{x,L)) = Ep^g (e;p(x, Lj)) and 
Ep-^ [ei^{x, LI)) = (since supp(/ii) C L^). The inequality of (58) is a direct ap- 
plication of Lemma 3.1. Setting 75 = ai(l — fii{{0}))2P^f / {TT^/d^i)P and combin- 
ing (58) with Hoeffding's inequality, we obtain (57). 

Now, (57) extends for a small neighborhood of L. That is, for any L e G{D, d) we 
can find a ball Bq{L, t) for some t > such that w.o.p. the subspace LI is a better 
Ip subspace than any of the subspaces in that ball. By covering the compact space 
G(£', d) \ Bg(L*, 71) with finite number of such balls we obtain that w.o.p. LI is the 
global Ip subspace in G{D, d) \ BgC^i, 71). Combining this observation with part I, 
we conclude that w.o.p. L^ is the global Ip subspace in G{D, d). 
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3.4.2. Extension of the Proof to K > 1 

Part I: L\ is a Global Ip Subspace in 60(^1, 71) 

We maintain the same notation of §3.4. 1, especially for similar constants. We will show 
in this part that w.o.p. L'l is a global Ip subspace in the ball BG(ii, 71), where 71 is a 
sufficiently small constant. 

In order to do so, we arbitrarily fix L e G{D, d) such that distc {L,Ll) = 1 (so that 
the matrix C of principal angles between L and LI is in NS+(d)) and parameterize a 
geodesic Hne from LI to L by a function L: [0,1] — >■ G(D,(i), where L{0) = LI 
and L{1) = L. We will then estimate the probability that for any such L the function 
ei^ {X, L{t)): [0,1] R has a positive derivative at any t € (0, 71), that is 



d /Exe;tdist(x,L(i)F 



> foralH G (6,71). (59) 



dtp \ N 
First of all, we prove that there exists a constant 72 > such that 
d ^Exe;tdist(x,L(t))f 



&tP \ N 



> 72 w.o.p. (60) 



We start the proof with the case where p = 1 and decompose the sampled data set as 
follows: X = uf^Q A^i, where Xi is the set of points sampled from fii for all < i < isT. 
It follows from (15) that the event in (60) is the same as the event 

Sxg;.J|CVPLi(x)||-||CVB^.,;,\^JU 

> 72 (61) 

VC e NS+(d) and V e 0{d). 

We will prove (61) in two steps. In the first step we will fix matrices Co S NS-|-(c?) 
and Vo E 0{d) and show that 

Exg;., ||CoVoPl;(x)|| - ||CoVoB^;,;,\^JU ^ „ 
> ^72 (62) 

w.p. > 1 - (21)2 + 1) cxp(-2A^7|), 

where 72 := f^oEf_,i \\CoVoPli (x)||/6 and (3o = ai - J2f=2 °^3- ^" '^e second step 
we will combine a covering argument and (62) to prove (61). 

In order to prove (62), we will first verify the following two probabilistic inequali- 
ties; 

!|CoVoBi.,;,JU 



N 

and 



< 272 w.p. 1-2D'' exp(27^ TV) (63) 



E.ex, ||CoVoPli(x)|| - E.ex\{x,ux,} IICqVoPli (x)|| 
— > 472 (64) 

w.p. > 1 - cxp(-2Ar7|). 
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To prove (63) we define Jo(x) = /(x e <Yo) Pl* (x)Pf. (x)^/dist(x, We 
note that its elements lie in [—1, 1] and E^„{Jo{x.)) = 0. Indeed, denoting i?i»(x) = 
Pl* (x) — P^* (x) (i.e., _Ri» (x) is the reflection of x w.r.t. the d-subspace LI) we obtain 
that 

2i?^„(Jo(x)) = E^„{M^)) - E,M{Rli{^))) = 0, 

where the first equality is clear since Pl* (x)P^* (x)^ = —Pli{Rli {^))Pl' i^Li (x))"^ 
and the second one follows from the symmetry of /io. Therefore, combining the fact 
fliat 

D„-=efDe. < max u^Dv/||u|| j|v|| = ||D||*, 
u,veKO 

for any D e M^^^ and 1 < i,j < N, and Hoeffding's inequality for the random 
variable Jo(x), we establish the following inequality, which clearly implies (63); 



Pr II ^ P,.j(x)Pi-.(x)^/dist(x,Lt)IU/iV < 272 



>Pr(l| ^ Pij(x)P^.(x)^/dist(x,Lt)|U/iV< 272 ) > 1 - 2 exp{2j', N). 

(65) 

To prove (64), we define the random variable Ji(x) — (/(x £ Xi) — I{x E X \ 
{Xi U <Yo}))||CoVoPl* (x)|| and using the spherical symmetry of we have 

^ p ( i:.exA\C.Y,P,-^M\ \ E.ex\{x.ux„} ||CoVoP,. (x)|| ^^ 



TV 



(66) 



K 

= aiS^J|CoVoPL.(x)|| - ^a,£;^J|CoVoPL*(x)|| 

K 

> aiS^J|CoVoPL.(x)|| -^a,£;^J|CoVoPL*(x)|| 

= /3o-B^.J|CoVoPlj(x)|| =672. 

We conclude (64) by applying Hoeffding's inequality to the random variable Ji(x), 
while using the facts that its expectation is larger than 672 and its values are in [— 1 , 1] . 
At last, we conclude (62) via (63) and (64). We first observe that 

||CoVoBi.^;ir\Aril|* < \\Co^o'BLl,X\{XiUXa}\\* + W^Q^Q^ LI, X\Xo\\* (67) 

and 

||CoVoB^j,;,\{;,,u;to}IU = llCoVo ^ Pi.(x)P^^.(x)^/dist(x,LJ)IU 

::^eX\{XiUX„} 

(68) 



G. Lerman and T. ZhangAp-Recovery of the Most Significant Subspace 25 

< Y. l|CoVoPz.i(x)Pi-j(xr/||Pi^.(x)|l|U< I|CoVoPlj(x)|1. 

Applying (67) and (68), we bound the LHS of (62) by the difference between the LHS 
of (63) and the LHS of (64) as follows: 

Exe;.. liCoVoPi.(x)|| - IICoVoB^j^v^ 

(oy) 



> 



> 



N 

E^eX, I|CoVoPl>(x)|| - ||CoVoB^.,;,\{;,^u;,,}||, - ||CoVoB^;,;,\;,JU 

N 

X\{XiUXo} IIC'oVoPl* (x)|| j|CoVoBL*^;tn| 



N N 



Equation (62) is thus an immediate consequence of (63), (64) and (69). 

Next we address the second step of the proof of (61). That is, we combine a covering 
argument and (62) to conclude (61) (which is valid for all matrices C e NS-|-((i) and 
V e 0(d)). We define 

distNs^(d)xO(d)((Ci,Vi),(C2,V2)) :=max(|lCi-C2|l2,|lVi-V2||2) (70) 

and note that whenever distNs+(d)xO(d)((Ci,Vi), (C2, V2)) < 72/2andx e B(0, 1) 
we have that 

i|CiViPi.(x)|l-|lC2V2Pij(x)|| 

= (llCiViPi.(x)il - ||C2ViPi.(x)||) + (||C2ViPi.(x)|| - ||C2V2Pl.(x)||) 
< llCi - C2II2 + IIC2II2IIV1 - V2II2 < 72- (71) 

Combining (62) and (71) we obtain that for (C, V) in a ball in NS+(d) x 0{d) of 
radius 72/2 and center (Co, Vq): 

Ex.^. I|CVPl>(x)|| - E^.^^x. ||CVP,.(x)H 

^ > 72 w.p. > 1 - exp(-2A^7^). 

(72) 

We easily extend (72) for all pairs of matrices (C, V) in the compact space NS+ (d) x 
0{d) (with the distance specified in (70)). Indeed, it follows from [29, Theorem 7] that 
0{d) can be covered by c[^i'^~^y^ ^ ^^^^2^'^'^-^^^ balls of radius 72/2 for some 
C[ > (note that the dimension of 0{d) is d{d — l)/2). Since NS+((i) is isomor- 
phic to S'^~^, it follows from [32, Lemma 5.2] that it can be covered by 3'^ 7(72/2)'^ 
balls of radius 72/2. Therefore, the product space NS+((i) x 0{d) with norm de- 
fined in (70) can be covered by cf ''+^^/V(72/2)''^'*+^^/^ balls of radius 72/2, where 
Ci :— max(C(, 3), and consequently 

(61) is valid for any C e NS+(d) and V € 0(d) 

w.p. 1 - ''+^'/' cxp(-2iV72)/(72/2)''('^+i)/2, (73) 

which means that (60) with p = I holds with the probability specified in (73). 
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When < p < 1, it follows from (42) and Hoeffding's inequality that (60) holds 
with fixed C and V w.p. 1 - cxp(-2A^7|), where 72 ai • (||CVPij (x)||p)/2. 
Following the same covering argument as in the proof of (73), we conclude that (60) 
holds with the same probability specified in (73) (though 72 is defined differently for 
p=l and < p < 1). 

Equation (59) follows w.o.p. from (60) in exactly the same way of deriving (50) 
from (51) (with the help of (52), which is deterministic and easily extends to the current 
case). While we did not estimate the overwhelming probability for (50), it is easy to 
show that in the current case, (60) implies (59) w.p. 1 — cxp(— 7V76)/76. Carrying 
this analysis, one notices that both 71 and 75 depend on d, K, ao, ai, iiq, ni, p and 
miii2<i<A'(distG(ii, )). Combining this with (73), we obtain that 

Ll is a global Ip subspace in 60(^1, 71) 

w.p. 1 - ('^+^'/' exp(-27V72)/(72/2)^('^+i)/2 _ exp(-7V76)/76. (74) 

Part II: L\ is a Global Ip Subspace in G{D, d) 

We will first prove that L'l is a global Ip subspace w.o.p. in G{D,d) \ Bq{LI,^i). 
Applying Lemma 3.3 we obtain that for all 2 < i < iiT: 

Ef,, (dist(x,L)P -dist(x,L^)P) +£;^^(dist(x,L)P -dist(x,i;f)P) > 0. (75) 

Further application of Lemma 3.1 with L e G{D, d) \ BG(ii, 71) results in the in- 
equality: 

..,.(..,(x,.))> "-''-"°'»^;-'^^- . (76, 
Now, combining (75) and (76) we have that 
Ef, (dist(x, Lf- dist(x, Ll)P) 

K 

= (dist(x, LY - dist(x, Lir) + E^^ (dist(x, Lf - dist(x, Llf)) 

i=2 

+ PoE,, (dist(x, LY - dist(x, Lir) > 7^"""^- . (77) 

We define 

/?o.(l-Mi({0})).2^-i.7f 
77 = r- ^ , (78) 

and note that it depends on d, K, /.iQ, /ii, ao, cui and min2<i</f(distG(LJ, L*)). Ap- 
plying Hoeffding's inequality to dist(x, L) — dist(x, L^!), whose absolute values are 
uniformly bounded by 1 and its expectation is at least 77 (which follows from (77) and 
(78)), we obtain that for any L e G{D, d) \ BG(il , 71): 



ei^iX,L) - ei^{X,Ll) > w.p. > 1 - . (79) 
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By Lemma 3.2 we have that for any L' e G{D^d) satisfying <l\siQ{L,L') < 
(^7/4)i/p and any x e B(0, 1): 

|dist(x, L'f - dist(x, L)P\ < 77/4. 

Consequently, for any L e G{D, d) \ Bg(L^ 71) and all i' e Bg{L, (77/4)1/?); 

ei^iX,L') - ei^{X,Ll) > w.p. > 1 - cxp(-7V77V8) . (80) 

We can cover G{D, d)\BG{Ll , 71) by c^i^-d) ^^d{D-d)/p ^^jj^ of radius (77/4)!/? 
(this follows from Remark 8.4 of [28]). Now, for each such ball we have that (79) is 
valid for its center w.p. 1 — cxp(— iV77/8) and consequently (80) is valid for sub- 
spaces in that ball with the same probability. We thus conclude that (80) is valid for all 
L' e G{D,d) \ BG(iI,7i) w.p. 1 - exp(-7V7|/8)C2'^~'''/^/7f ^■''^ Combining 
this with (74), we obtain that the probability that is a global li subspace in G{D, d) 
is 

1 - Cf ''+^)/'exp(-2iV7|)/(72/2)''(''+i)/2 

- exp(-7V74)/74 - exp(-7V77V8)C2'^^-'V77^''"'^^^ 

or equivalently, 1 — C exp{—N/ C) for some C depending on D, d, K, /io, /^i, ceo, 0:1, 
p and min2<i<K(distG(it, ))• 



3.5. Proof of Theorem 1.2: Stability Analysis 

3.5.1. Reduction of Theorem 1.2 

We first explain how to reduce the proof of Theorem 1.2 when < p < 1 to the veri- 
fication of a simpler statement. We then adapt this idea for proving the same theorem 
when both p > 1 and K ~ 1. 

In order to prove Theorem 1.2 when < p < 1, i.e., prove that the global minimum 
of eip{X, L) is in Bq{LI, f) w.o.p., we only need to show that there exists a constant 
pi > such that for any L ^ 80(^1, /): 

E^MA^,L)) > E^MM,Ll)) + p,. (81) 

Indeed, we cover the compact space G{D, d) \ Bq{LI, f) by small balls with radius 
Pi/2. Then by using (81) and Hoeffding's inequahty, we obtain that eip{X,L) > 
ei^{X , L\) for any L in each such ball w.o.p. Therefore, ei^{X, L) > ei^{X, LI) for 
L € G{D, d) \ Bq^LI,/) w.o.p. Equivalently, G{D, d) \ Bg{LI, f) does not contain 
the global minimum of ei {X , L) w.o.p. 

We further reduce (81) by using the measure /i instead of fi^ (see §1.3). We note 
that for i = \, . . . ,K, Hi coincides with the projection of /^t; x Vi_(^ onto L* (that is, for 
any set E <Z L*: p,i{E) = x Vi^e{PJ^}{E))). Combining this observation with the 
triangle inequality and the concavity of we obtain that 



\E,^^,^^SeiM.L))-E,M^{^,L))\ = \E,^^,^_MPL-{^^^^^ 
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Summing (82) over all 1 < i < K, we have 

\E^S^^i^{^.L))^E^{ei^{^,L))\<eP. (83) 

Hence, in order to prove (81) and thus Theorem 1.2 for p < 1, the following equation 
is sufficient: 

£;^(ei^(x, L)) > E^ei^i^, ^D) + Pi + for any L e G{D, d) \ Bg(LI, /)• 

(84) 

We can similarly reduce Theorem 1.2 when K = 1 andp > 1. However, (82) needs 
to be modified since is not concave when p > 1. For this purpose we note that for 
any xi, X2 £ B(0, 1) 

dist(xi ,Liy - dist(x2 , Li )P < 1 - ( 1 - dist(xi , X2 ) )^ < p • dist(xi , xa ) . (85) 

Indeed, when p = 1 (85) is immediate (it is equivalent to (x2—xi)j| < j|x2— xij|) 
and it extends to p > 1 by the following proposition: if < j/i, y2 < 1. J/i ~ 2/2 < V 
and p > 1, then ~ 1/2 < 1 — {1 — f])^- Combining (85) with the derivation of (82), 
we conclude the following analog of (82) in the current case: 

\E^M^{^,L))~E^{ei^{^,L))\<p-e. (86) 

Consequently, we reduce (81) (and thus Theorem 1.2) when K ~ 1 and p > 1 to the 
following condition: 

£;^(e,^(x, L)) > £;„(e,^(x, LJ)) + pi + 2pe, for any L e 0(0, d) \ BciLlJ). 

(87) 

3.5.2. Proof of (84) and (87) and Conclusion of Theorem 1.2 

We arbitrarily fix L e G{D, d) \ Bg{LI , /). We assume first that < p < 1 and apply 
Lemma 3.3 to obtain that 

K 

= ^a, (£'^i+^^e;p(x,i) - E^^+^^ei^{-K,Ll)) > 0. 

1=2 

Consequently, we prove (84) with pi := 26^ as follows: 

E^{ei^{^,L))-E^{ei^{,^,L\)) > (^ai-J2a}j (e,^ (x, i)) (88) 

(1 -Ml ({□})) 2^^-^^ 
> = AeP 

{■K-Vd-iiy 

where the second inequality applies Lemma 3.1 and the last equahty uses the fact that 
the term ao + 2 • ai — 1 in the definition of / equals {ai — X^iLa '^i)- Equation (8) is 
obtained by solving for / in the last equality of (88). 

Equation (87) (with p > 1) follows from the same argument of (88), where is 
now replaced by pe. Equation (9) is deduced in a similar way to (8). 



G. Lerman and T. ZhangAp-Recovery of the Most Significant Subspace 



29 



3.6. Proof of Theorem 1.3: Symmetry Arguments 

3.6.1. First Reduction of Theorem 1.3 

Theorem 1.3 states that the global Ip subspace is not in 'Qq{L\, kq) w.o.p. for almost 
every {L*}^j^ € G(I?, d)^ . We claim that it reduces to the following simple equation: 

iD.d ( {i^fli C G{D, d):Ll^ argmin E^{ei^i^, L))] = 0. (89) 

y LeG(D,d] ) 

Indeed, if (89) is not satisfied, then for Lq = arg min^^Q^^ [ei^ (x, L) and any K 
d-subspaces {L* in a subset of G{D, d)^ with nonzero 7^' ^ measure, the constant 

Ci :-£;^(e,^(x,it))-i?^(e;^(x,Lo)) 
is positive. For any L* E Bq{LI, kq) and x S supp(/i) C B(0, 1) 

dist(x,i*)P-dist(x,L*)P < F - (1 -distG(i*,i*))P < p • distG(L*, L*) 
and therefore 

E^{ei^{jc,L*)) > E^{ei^{^,Ll))- Ko-P- (90) 

Letting ^0 = ^0 = Ci/4pe, we obtain from (86) (using the fact that e < 60) and (90) 
that 

E^SeiM^L*)) - E^SeiM,Lo)) > E^{ei^i^, L*)) - E^iei^{^, Lq)) - 26oP 
> Ef,{ei^{yL,Ll)) - i;^(e/p(x,io)) - 2(5oP - kqP = ^■ 
Therefore, by Hoeffding's inequality: 

CiN 

ei^{.X,L*) ~ ei^{X,LQ) > w.o.p. 

In order to have 

ei^{X,L*) - ei^{X,Lo) > for all L* G Bg(LI,«:o) w.o.p., 

we cover BG(i*, Ko) by small balls with radius Ci/16, so that e; {X,L) > e/ {X,Lo) 
for all L in each such ball w.o.p. Therefore, e; {X,L) > e/ {X,Lo) for all L G 
BG(ii, Kq) w.o.p. Equivalently, Bq{LI, kq) will not contain the global minimum of 
e;^ {X, L) w.o.p. This contradicts Theorem 1.3 and therefore (89) implies this theorem. 

3.6.2. Second Reduction of Theorem 1.3 
We define the operator 

Dl,x,p = Pl (x)P^(x)^dist(x, (91) 
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and the function 

h{LlL*) = ii;^.(DL.,x,p), Q<i^l<K. 

In view of Proposition 2.2, (89) follows from the condition: 

Ida {{LlYli C G{D, d) : (Dij,.,^) = O) = 

which we rewrite as follows: 

Ida {{LDti C G{D,d) : (D^j.^.^) = O) 
/ 

{L*}t,(lG{D,d):E ^ fD 



i = 



= 



K 



K 

=1da 



{L*}t^ C G{D,d) : Y^a, h{Ll,Ll) = U 0. 



1 = 



(92) 



(93) 



Since are identically and independently distributed according to 7£).ti, Fu- 

bini's Theorem implies that (93) follows from the equation: 



IDA [L; e GiD, d) : h{Ll,Ll) ^ H(Ll, L^, • • • , L\)) = 0, 



where 



K 



(94) 



(95) 



i = 



3.6.3. Third Reduction of Theorem 1.3 

We denote the principal angles between and i^; by {^jj^^i, the principal vectors of 
L2 and L\ by {v^j^^j^ and {vjj^^j^ respectively and the complementary orthogonal 
system for w.r.t. L\ by {ujj^^j^. Note that /i(L^,X2)' a function of x, maps 
Sp({uj}f^i) to Sp({vi}j^^i). Now, transforming x e n B(0, 1) to {ai}'^^^ in a 
d-dimensional unit ball by x = X^iLi '^i'^i, we have that for any 1 < ii, «2 < d: 

v,^/i(i*,i;)u,, = £;^,(v^Pij(x)Pi-.(x)^u,,dist(x,Lt)f-2) 

p-2 

/ d \ — 

cos(6'i Jail sin(6'ijai2 ^ sin^ 9i j d/Z2. 



When ii 7^ 12, the function 



cos(0i Ja^i sin(6'i2)a 
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is odd w.r.t. and consequently 

d 

Therefore, when we form V and U as in (28), the d x d matrix V/i(L|, i2)U"^ is 
diagonal with the elements 



/ cos{6j)sm{6j)a^, ^ sin 



p-2 

2 n \ 



Notice that Yh{Ll,L*) = h{Ll, L*) = /i(L*,i*)U^ and that h{Ll, L^) has the 
following singular values, where j = 1, • • • ,d: 

\j{h{Ll,Ll))^ [ cos(0,)sin(^j)a2 a,2sin2a, I d/i2. 

We arbitrarily fix L\, Lg, L4, • • • , and denote the singular values of H (which is 
defined in (95)) by {ai}f^i and observe that (94) is implied by the following equation: 

IDA {l; e G{D,d) : Xi{h{LlL;)) e {ajf^O - 0, (96) 

which we express as: 



p-2 

d ' 



lD,d 

= 0. 



j^^ cos(0i) sin(0i)a2 | ^ sin^ 0, j d^ia e {frjfii j (97) 



5.6.4. Proof of (97) fl«c/ Conclusion of Theorem 1.3 
We first conclude (97) when p = 2. In this case 



p-2 



Xi{h{Ll,L*))= [ cos(6li) sin(6li ( Va^ siii^ 61, 1 dA(2 

cos(6'i)sin(6li)a2d/i2 (98) 

is a monotone function of 9i on [0, 7r/4] as well as [7r/4, 7r/2]. That is, the requirement 
that Ai(/i(L|, L2)) S {(Tilf^i can occur only at discrete values of 6*1 (at most 2L)) and 
consequently has jo.d measure 0, that is, (97) (and consequently (89)) is verified in 
this case. 



G. Lerman and T. ZhangAp-Recovery of the Most Significant Subspace 



32 



Ifp7^2and{6'i}: 



1=1 



are fixed, then 



p-2 




(99) 



is a monotone function of 6d- Following a similar argument, we obtain that 



(100) 



Combining (100) with Fubini's Theorem, we conclude (97). 
3.6.5. Remark on the Size o/Sq and kq 

The above constants 6o and kq depend on other parameters of the underlying spheri- 
cally symmetric HLM model in particular the underlying subspaces {L*}fLi. We re- 
call that kq = (5o = Ci/4p, where Ci = Ef,{ei^{x, L*J) - mmLf=Q^D^a) E^,{ei^{x, L)). 
Therefore, in order to bound kq and Sq from below, we bound from below as follows: 



We include the proof of (101) in §A.8. It also leads to a lower bound for the constants 
So and kq of [ 19], which is better than the one mentioned there (§4.5.5). 

We derive (11) from (101) as follows. We recall that (11) applies to the case where 
K = 2, aa = 0, dim(i^) = dim(L2) = 1, D = 2 and where fii and fj,2 are uniform 
distributions on line segments centered on the origin and of length 2 within L'^ and ij- 
If 9 is the angle between and Lj' 



The lower bound for both and (5o in ( 1 1 ) thus follows from (101), (1 02) and the fact 
that Ko^So^ Ci/ip- 

3.7. A Counterexample for Exact Asymptotic Recovery 

Theorem 1.2 established near recovery of LI for a spherically symmetric HLM mea- 
sure /if when e > and < p < 1. It is sometimes more desirable to have exact 
asymptotic Ip recovery of LI. It means that if X = {xi, X2, • • • , xa?} is an i.i.d. sam- 
ple from /ig and ^(Ar) is the minimizer of e;^ {X , L), then i(jv) converges to L\ w.p. 1 
as N approaches infinity. However, this is generally not true for any p > when K > \ 
and e > 0. Indeed, we provide here a simple counterexample, whose verification fol- 
lows the proof of Theorem 1.3. 

We assume a spherically symmetric HLM measure fie with vi^^ symmetric around 
the origin satisfying i^i,e(0) = and Vi_^ = 5, i = 2, . . . , /-C (i.e., there is no noise 
around {L* }^2)- We say the L\ is a global Ip subspace in expectation if the expectation 




when p >2; 
when 1 < p < 2. 



(101) 



(102) 
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of ei {X,L) — ei {X,Ll) is nonnegative for any L e G{D,d). Since, w.p. 1 the 
sampled i.i.d. points from do not lie on (i.e., Xi = 0), it follows from Proposition 
2.2 that a necessary condition for LI to be a local Ip subspace in expectation is 

J Pl'^ (x)Pi. (x)^dist(x, a^,{x) = 0. (103) 

The symmetry of /.(i x w.rt. implies that 

J Pli {^)Pti (x)^dist(x, L\Y-^ d^i X vi^,{x) = 0. (104) 

Combining (103) and (104), we obtain that 

K 

/ Pli {■^)Pti (x)^dist(x, Liy-^ AiJL,{x) = 0. (105) 

However, the proof of (93) implies that the measure of (105) w.r.t. {L*}f^^ 
is zero. That is, a.e. L\ (w.rt. = 1) is not the global Ip subspace in expectation. 
Consequently, a.e. L\ is not the asymptotic global Ip subspace (since exact asymptotic 
recovery is stronger than recovery in expectation). 

4. Discussion 

We studied the effectiveness of Ip minimization for recovering and nearly recovering 
the most significant subspace within outliers w.o.p. Our setting assumed identical and 
independent sampling from a spherically symmetric HLM measure (and sometimes 
weakly spherically symmetric HLM measure) with noise level e > 0. A restricted 
setting like this is necessary and indeed we described some typical cases where global 
Ip subspaces are different than global subspaces for all < p < 00. 

Our analysis provided some guarantees for the robustness to bounded spherically 
symmetric outliers of the single subspace recovery advocated in [7] as well as sequen- 
tial HLM. The recovery established here is for the theoretical minimizer of the energy 
and not for any algorithmic output. Both [40] and [18] followed some ideas of this pa- 
per in their analysis of a convex relaxation of (1) when p = 1. The theoretical guaran- 
tees of the latter works require a bound on the fraction of outliers, while still assuming 
spherically symmetric outliers. On the other hand, the theory described in this paper 
does not restrict the fraction of spherically symmetric outliers. 

We proceed with possible extensions of this theory. 

4.1. Sub-Gaussian Distributions 

The boundedness of the support of the distributions {/ii},f£o weakened by as- 

suming that they are sub-Gaussian. Indeed, this will mainly require using the Hoeff ding- 
type inequality for sub-Gaussian measures of Proposition 5.10 in [32]. 
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4.2. Beyond Spherical Symmetry 

It is possible to relax the requirement of spherical symmetry by asking for the "pro- 
jected distribution onto the sphere" to be uniform; we clarify this notion as follows. 
Let denote the projection of MP onto the sphere §^^^, that is, for any x e 
P'^(x) = x/||x||. For a distribution /i on R"^, its projection onto the sphere §-^^^ 
is defined for any set E C S^^^ by {E) = ^{P^ ^{E))- We claim that we may 
replace the spherical symmetry of /ip by the uniformity of ji^ (this is obvious when 
reviewing the details of our proofs). Similarly we may replace the spherical symmetry 
of {/iil^j within {i*}f£i by the uniformity of {^f Ifli' where S is the sphere S'^~'^ 
(within each respective d-subspace). The following example suggests a measure ji on 
K^, which is not spherically symmetry, but for which ji^ is uniform, where S ~ S"^. 
For simplicity of notation, we describe this example using the complex plane C. We 
Let /Lt be a mixture distribution on C (or R^) with two components: the first one has 
density 1/3 on the closed semicircle {4 • e*^ : < 6* < tt} and the second one has 
density 2/3 on the open semicircle {2 • e*^ : tt < 6 < 2 ■ tt}. It is clear that the 
projection of /i onto is uniform, but ^ is not spherically symmetric. 

In Theorem 2.2 it is also possible to replace the spherical symmetry assumption on 
Ho by symmetry with respect to LJ, without changing the implication of that theorem. 
It is even possible to assume a slightly weaker assumption: iJ^^ (Dl*.x.p) = 0, where 
D L* ,x,p is defined in (9 1 ). 

A symmetry-type property of /ip is crucial for the proof of Theorem L L If /io is 
not spherically symmetric (or more generally /ig is not uniform), then only with a 
restriction on the fraction of outliers we can guarantee the recovery of L'l by li mini- 
mization (see Theorem LI of [ 19]). On the other hand, we may still relax the spherical 
symmetry of within {L*}fLi and require instead approximate spherical sym- 

metry within {L*}fLi. That is, we require for i = 2, • • • ,K that there exist 
spherically symmetric distributions within L* such that the derivatives fi := d^i/ djji, 
i = 1, . . . , A' are bounded away from and oo. In this case, (6) is replaced by 



4.3. Affine Subspaces 

We restrict the theory of this paper to linear subspaces, since affine subspaces do not 
fit within the framework of spherically symmetric measures. The common strategy of 
using homogenous coordinates which transform rf-dimensional affine subspaces in R^ 
to {d + 1) -dimensional linear subspaces in R^+^ is not useful to us since it distorts 
the structure of both noise and outliers. On the other hand, the theory of [19] can be 
generalized to affine subspaces (see §5.6 of [19]). 

4.4. p = 1 Versus < p < 1 

Our main theorems do not distinguish between p = 1 and < p < 1. However, 
Proposition 2.1 shows that many subspaces can be local Ip subspaces when p < 1 (in 




(106) 
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particular, d-subspaces spanned by subsets of outliers). Such wealth of local minima 
clearly does not occur when p = \. An open problem is to estimate the number and 
depth of local minima when p = \ for spherically symmetric HLM measures. 



Appendix A: Supplementary Details 
A.l. The Constant 

In order to define ^i, we first define the function 

V'^,(t) = Aii(xeiMx^v| <t), (107) 

where v is an arbitrarily fixed vector in L\ (since fii is spherically symmetric within 
L\, -ij}^^ is independent of v). We now define 



ei = %l ( 



1 + Mi({0}) 



(108) 



2 

In order to verify that the RHS of (108) is well-defined we establish the following 
proposition. 

Lemma A.l. The inverse function ipj^^ix) exists for any ^i({0}) < x < 1. 

Proof. We only need to prove that ipf^-^ (i) is continuous and strictly increasing on the 
interval (0,to), where to := min{t S K : = !}■ The continuity of ipfj,i{t) 

follows from the following observation: 

/.Ji(L \ {0}) = for any affine subspace L d Ll. (109) 

We prove (109) by induction on dim(L) . We prove it first for dim(iy) = 0. We assume 
on the contrary a point L different than the origin such that > 0. Let S' denote 

the sphere centered at the origin and containing this point. Due to the spherical sym- 
metry of fj,i within L|, /ii of any point on S' equals fJ.i{L), which is positive, and this 
contradicts the fact the /ii is a probability measure (in particular, finite). We assume 
next that (109) holds when dim(L) < j — I and prove (109) for dim(L) = j. The 
idea is similar to the proof of the case where dim(L) ~ 0, but several clarifications 
are needed. We first note that if L' is a rotation (different than the identify) of the j- 
dimensional affine subspace L within L|, then LO L' has dimension smaller than j. It 
thus follows from the induction assumption (i.e., (109) holds when dim(L) < j — 1) 
that/^i(Ln L'\ {0}) = 0. Therefore, ^i(LUi'\ {0}) = 2^i(L\ {0}). Using these 
observations, we assume on the contrary that ni{L \ {0}) > and create distinct 
subspaces by rotating L, where N > l/iJi (L). We note that jii of their union is greater 
than 1, which is clearly a contradiction. 

Next, to prove that V'aii(^) strictly increasing on (0,io)- We assume by contra- 
diction that there exist ti,t2 G K such that ti < t2 < to and tji^^iti) = ^/Vi(^2)- It 
follows from the definition of "0^^ that /-ii(x ^ L\ : ti < |x-^v| < t^) = Q for any 
v <E L\. Since 

{xe L* : l|x|| > ti} = IJ {xe L* :ti < |x^v| < ia}, 
llv|l=i,veLj 
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we conclude that /ii(x G L\ : ||x|| > ii) = and therefore V'aii(^i) ~ 1' which 
contradicts our assumption ti < to and the definition of t.Q. □ 

A.2. Proof of a) 

We estabHsh here the following upper bound on in the special case where /ii is 
uniform on B(0, 1) n LI and LI is a rf-subspace in R-^: 

, / ^ 2d 

V'^,(<)<— t. (110) 

TT 

Combining (108), (110) and Lemma A.l, we conclude (7). 

Let us denote the volume of the rf-dimensional unit ball by the d-dimensional 
volume measure on LI (i.e., Lebesgue measure on LI) by Vol^. We note that 

{x = {xi,X2,--- ,Xd) e B(0,l)nL^ : \xi\ < t} 
C |x = (a;i,a;2,--- ,Xd) £ B(0, 1) n : |a;i| < t,\x2\ < ^'II^? ^ 
and consequently 

Vold{x:xeB(0,l)nL* : < t} < 4v<j_2t. (HI) 
Combining (111) with the observation: v^ = ^Vc;_2, we conclude (110) as follows: 

1^^,(0 = Void {xeB(0,l)n : \xi\<t)} /volrf {B(0, 1) n i*} 

4vci_2i 2d 
< = —t. 

Vd TT 

A.3. Proof of (20) 

The fact that E^-^ (Pl» (x)Pi» (x)-^) is a scalar matrix follows from the symmetry of 
Hi on L'l n B(0, 1). It is also obvious from (45) that S^, linearly scales with We 
compute 6* when i?i = 1 as follows. We arbitrarily fix a vector v e M'' as well as a 
{d — l)-subspace L C LI orthogonal to v and observe that 

-5* = E^, ((Pi.(x)^ v)2) = E^, (dist(x,L)2) . 

We further note that for any < r < 1, the set {x e B(0, 1) n L| : dist(x, L) = r} 
consists of two (d— 1) -dimensional balls of radius \/l — r-^. We consequently compute 
the constant 5■^, using the beta function B and the Gamma function F in the following 
way: 



5^ = E^, dist^(x,L) = 



_ r(|)r(^)r(^) _ i 
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A.4. Proof of Lemma 3.1 

We will use the following inequality, which we verify below in §A.4.1: 

^J^^ (x e B(0, 1) n Li : dist(x,Zi) < /3distG(Li, ii)) < VVi(^/3) V/3 > 0. 

(112) 

We fix /3i = 2 • ^1 / (tt • ^/d) and later prove the existence of this constant. Using the 
fact that distG(ii, Li) = e and applying (112), we obtain that 



Ati [x e B(0, 1) n Li : dist(x, Li) < /3i ej 

= Ml (x e B(0, 1) n Li : dist(x, Li) < /3i distG(ii, Li)J < (1 + Aii({0}))/2. 
Consequently, we derive the following estimate 

Ml (x e B(0, 1) n Li : dist(x, Li) > ^e) > (1 - mi({0}))/2. (113) 
Thus combining (113) with Chebyshev's inequality, the lemma is concluded as follows: 

(l-Mi({0}))2^-^eP 



A.4.1. Proof of (112) 

We denote the principal angles between Li and Li by {Oi}f^i, the principle vectors 
of Li and Li by {v,;}^^^ and {v^jf^]^ respectively, the interaction dimension by fc = 
k{Li,Li) (see §3.2.1) and 



i = 1, . . . , fc. 



Since ^^^^ 7^ = 1, WLOG we assume that 71 > l/k > 1/d. Expressing every point 
X in Li by X = {xi, X2, • • • , x^) ~ (v^x, v|"x, • • • , vjx), we obtain that 

|x e Li : dist(x, Li) < /3distG(Li, 



. xj siv? e,< (3 



x= [xi,X2,--- ,Xd) e Li : 
C <( X = (a;i,X2, • ■ ■ e Li : . ^ sin^ 6*,; < ^Z? a X! 

\ 1=1 \ i=l 



sm 



= < X = (xi,a;2, • • • e Li 
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C |x = {xi,X2, ■ ■ ■ ,Xd) e Li : \xi\ < 
cLeL,: |vf x| < ^4 . 



(114) 



We conclude (112) by combining (114) and the following immediate consequence of 
the definition of -0^^ (see (107)): 



A.5. Proof of Lemma 3.2 

We denote the principal angles between the d-subspaces Li and L2 by > 6*2 > ^3 > 
■ ■ ■ > 9d- Arbitrai-ily choosing Qi, Q2 G 0{D, d), representing Li, L2 respectively, 
we note that 

|dist(x, Li) - dist(x, L2)| = I ||x - xQiQf II - ||x - xQaQ^H | 
<||x-xQiQf -x + xQ2Qf|| < ||x|| llQiQf -Q2Q^||f 



. ^sin(0,)2<||x|| 



= ||x||distG(il,£2 
\ i=l 



A.6. Proof of Lemma 3.3 

We assume WLOG that i = 1 in (21). We thus need to prove that for all L e G(£>, d): 



E^, (dist(xi, L)f ) + E^, (dist(x2, L)") 
>E,,, (dist(xi, + E^, (dist(x2, ii)^'). 



(115) 



We denote the principal angles between Li and L2 by {Oi}f^^, the principle vectors of 
Li and L2 by {v^j^^j and {vijf^x and the complementary orthogonal system for L2 
w.T.t. Li by {uijf^i- 

We notice that we can restrict the set of subspaces L satisfying (115). First of all, 
we only need to consider subspaces 



L e Li + L2. 

Indeed, the LHS of (115) is the same if we replace L by L n {Li + L2). 
Second of all, we claim that it is sufficient to assume that 

Sp(v^, Vj) (^L for all 1 < i < k. 



(116) 



(117) 



We first show this for i — \. We suppose on the contrary to (117) that vi,vi g L. 
Since L is d-dimensional, there exists 2 < j < d (assume WLOG j = 2) such that 
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it does not contain both Wj and Vj. For any pair of points x = X^iLi '^i^i ^ 



dist(x, L) = 1 /sin(6'2)^a2 + lyf and dist(x, L) = Wsin(6'i)2a^ + 



where 



dist j a^v;, L J and z^2 = dist j ajV^, L j . 



Now, for L = Sp(i \ {vi, vi}, vi, V2), we obtain that 



dist(x, L) ~ y sin(6'i)^a^ + sin(6'2)^a| + and dist(x, L) = vi. 
Therefore 

dist(x, L)P + dist(x, L)P < dist(x, L)p + dist(x, i)^ 
and by direct integration we have that 

E^, (dist(xi , L)P) + E^, (dist(x2, L)P) 
<E^, (dist(xi, i)f ) + E^., (dist(x2, L)P). 



(118) 



Since L satisfies (117) for i = 1 and satisfies (118), we conclude that proving (115) 
only for L satisfying (117) with i ~ 1 implies it for all L E G{D,d). Similarly, 
we can assume that L satisfies (117) for all 1 < i < k, by verifying (118) for L = 
Sp(L \ {vi, Vi}, Vi, Vj) for some 1 < j ^ i < k such that Sp(vj, Vj) ^ L. 

It follows from (116) and (117) that L can be represented as follows: 



where 



V, = cos f , V, + sm f ,, u, . 



Thus, for any pair of points x = X^iLi '^■i'^i ^ ^1 ^ = X]f=i '^i'^i ^ 



dist(x, L) 



A ^sin^6'*a^, dist(x, L) 
\ i=i 



(119) 



dist(x, Li) = and dist(x, Li) = 



sin^ 0, a?. 



(120) 



Applying (119), (120), the triangle inequality (for "sine vectors" in M"*) and then the 
subadditivity of the sine function, we conclude that 



dist(x, L) + dist(x, L) > 



\ 1=1 
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> 



. sin^ Oiof = dist(x, Li) + dist(x, Li). 



Since p < 1, this inequality clearly implies that 

dist(x, L)P + dist(i, L)P > dist(x, Li)p = dist(x, Li)p + dist(x, Li)p. (121) 
We conclude (1 15) by appropriately integrating (121) and consequently prove the lemma. 

A.7. Proof of (46) 

We denote B = J2i'=i {^i)PLi (xi)^ and note that if maxi<j<d cTj (B — d^Id) < 
Tj, then 

llBv-(5«v|| 

^ — p;^p- ^ < ^ fo"" V e M'' \ {0}, 

and consequently 

(5, - 77 < for all v € M'* \ {0}, 

l|v|| 

that is, mini<j<(i crj(B) > (5* — 77. 

Proo/o/ (101) 

We first prove the following two lemmata. 
Lemma A.2. Forp > 1 and any x, y e B(0, 1), 



\^\r'^-\\y\r^y\\ < 



23-Pj|x-y||P-i, ,/l<p<2; 
(p-l)||x-y||, ifp>2. 



Proof. First we consider the case where either II x| I = lor||y|| = 1. WLOG we assume 
that ||x|| = 1. Whenp > 2 

II iixr-2x - iiyr^vii = iix - iiyr-vii < iix - yii w^^-yW + Wy-yw'^'yW 

l|x- y|| 

<l|x-y||^— (p-l)||x-y||, 

1 - lly|l 

where the second inequality follows from the identity 1— ||y|| + ||y — ||yp^y|| = 1 ~ 
||y||P~\ the inequality ||x-y|| > l-||y|| and the fact that the function /(t) = {t+c)/t 
is non-increasing for c > 0. 

On the other hand, when 1 < p < 2 

||||xr-2x-||yr-2y|| =||x-||yr-Vll < ||x-y|| + ||y-||yr-V|| <2||x-y||, 

(122) 
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where the last inequahty of (122) follows from the inequality 



l|y-||yr-' 



y\\ < llrr-yll < l|x-y| 



(123) 



which we explain as follows. Since y, ||yp^^y andy/j|y|| lie on the same line through 
the origin and since II y II < || ||y||^^^y|| < 1, ||yp^^y is located between y and y/||y|| 
and this clai-ifies the first inequality in (123). The second inequahty in (123) follows 
from the following observation: ||y/||y|| — y|| = 1 — ||y|| = ||x|| — ||y|| < ||x — y||. 

The main idea of the proof for the general case is to arbitrarily fix ||x — y|| and 
maximize ||||xp^^x— ||yp~^y|| We transform the problem into maximization over 
the two variables: r = log(||x||/||y||) and t = 2x-^y/ (||x|| ||y||) of the function 



when ||x — y|| > is fixed (if ||x — y|| =0 then (A. 2) is trivial). 

We first find the boundary of the domain of this function when cq := ||x — y|| is 
fixed. We then maximize the function on the boundary and later find a local maximizer 
within the interior of this domain. The variable t obtains values in [—2,2]. For any fixed 
t, we find the values that r may obtain. We note that ||x|p + ||y||^ — i||x||||y|| = Cq 
and e^^ + 1 — te"^ = CQ/||y|p. Since ||y|| < 1, if t is fixed and r < 0, then r is in the 
domain e^*" + 1 — te^ > Cq, whose boundary is e^*" + 1 — te^ = Cq. That is, when r < 
(i.e., ||x|| < ||y||), then ||y|| = 1. Similarly, when r > the boundary of the domain of 
h{r,t) corresponds to the case ||x|| = 1. 

Next, we verify (101) for points on the boundary of the domain of h{r,t) (it is 
sufficient to verify it for maximizers on this boundary). For fixed — 2 < i < 2, points 
on the boundary correspond to ||x|| = 1 or ||y|| = 1 and we have already verified 
(101) in this case. We also need to consider the boundary points t — —2 or t = 2, 
equivalently, x/||x|| = — y/||y|| orx/||x|| = y/||y||. We thus find the maximal values 
of h{r, 2) and h{r, —2) (when its denominator is fixed). The function \Jh{r, —2) (i.e., 
with X and y satisfying x/ II x| I = —y/||y||) is equivalent to 



Its maximum is obtained when a = 6ifl<p<2 and when a = or 5 = if p > 2. 
The function h{r, 2) (i.e., with x and y satisfying x/||x|| = y/||y||) is equivalent to 



Using the convexity /concavity of the power function x^^^ for different values of p we 
note that if p > 2 then its maximum is obtained when a = 1 or 6 = 1 and if 1 < p < 2 
then its maximum is obtained when 6 = 0. It is immediate to note that (101) is satisfied 
when a = (i.e., x = 0) or = (i.e., y = 0). We have also verified above that it is 
satisfied when a = 1 or 6 = 1. We also show that (101) is satisfied when a = b and 




(a + 6)P-i 



where a = ||x|| and b = ||y| 



(a-6)P-i ' 
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1< p < 2. Indeed, ||x-y|| < ||x|| + ||y|| = 2||x|| and thus Hx-yf-^ > (2||x||)f-2, 
which impHes that 

23-Pi|x - yf-i = 23-f ||x - yf-^ ||x - y|| > 2^-P{2MY-^ ||x - y|| 
=2||xr-2 ||x - y|| = 2||||xr-2x - llyf-Vll > 1| Hxf-^x _ ||yj|f-2y|| . 

We therefore verified (101) for points corresponding to the boundary of h. 

At last, we consider the interior of the domain of h. If (rg, to) is a local maximizer 
of h{r, t), then 

„ d,, , (e'^«+e--"+fo)-(p-l)(e(P-i)'-o+e-(P-i)'-o+fo) 

(J = —hir.t) = ^ 

At ^ ' ^ (r,t)=(ro,to) (e'-o + e-'-o + 

and [e"" + e"''" + to) = b - l)(e(P-i)'^o + e-fP-i)''" + ^o)- Therefore 

p-1 



Furthermore, its maximal value (when Iq is fixed) is obtained when t-q = or rg = oo 
or To — ~oo. Equivalently, it is obtained when a = or a = or 6 = 0. To conclude 
the proof we only need to verify that (101) is satisfied when a = b and any 1 < p < 2 
(all the other cases were discussed above). In this case, we use the fact that a < 1 and 
< I and consequently note that 

llllxf-^x - llyf-Vll = «^-'l|x - yll < l|x - y|| < (p - l)i|x - yl|. 

□ 

Lemma A.3. If f, g : R ^ R, g{0) = 0, g is increasing and 

\f{xi) - f{x2)\ < g{\xi - X2\) for any xi,X2 G M, (124) 
then the following inequality is satisfied, where x — vtmixev. f{x): 

/(a:o)-/(.T)>|/'(xo)|.g-i(|/'(xo)|)- / g{x)dx. 



Proof. WLOG we assume that /'(.tq) > 0. Applying this assumption, (124) and the 
definition of x, we conclude the lemma as follows: 

/(xo) - f{x) > fixo) ~ fixo ~ g-\f'{xo))) = / fix) dx > 

Jxo-g-Hf'{xo)) 

g-Hf'ixo)) 

ifixo) - g{xo - x)) dx = f'{xo)g-Hf'{xo)) - J g{x) dx. 

xo-g-Hf'{xo)) 

□ 



I 
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To prove (101), we restrict £'p(eip(x, L)) to a geodesic line i : [0,oo) — > G{D,d) 
with L(0) = L\. Then we use the following inequality to find the lower bound of Ci: 

Ci ^E^,{ei^{yi,L\)) - m\n E ^,{ei^{yi, L)) 

>E^iei^i^,LiO)))-minE^iei^{^,Lm. (125) 

The lower bound of the RHS of (125) will be obtained by applying Lemma A. 3 to 

f{t) = E^{ei^{-K,L{t))) with a specific L{t). 

We choose this L{t) such that distG(i(0), L(l)) = 1 and 

^E^{ei^{^,L{t))) ^~p\\E^{TiLi.^.j,)\\F. (126) 
at t=o ^ 

To show that this is possible, we recall (see (36)) that 

'^-E^{e,^{^,L{t))) =-ptr(CVi?^(Dij,x,p)U^), (127) 



dt 



t=o 



where !| C|| i? = 1 (since we use the distance defined in (2)). Let us denote the thin SVD 
of i?^j(Di* x,p) by VoSpUj^ . We choose the matrices V, U and C, which determine 
L{t), as follows: V = Yq, U = Uq and C = So/||Soi|F- This choice indeed 
implies (126) as a consequence of (127) and the following observation: 

ptr(CVi?^(DL.,x,p)U^) =ptr(S2)/||So||i. =p||So||f =p||i?^(Di.,x,p)||F. 

We proceed by finding g for f{t) = E^{ei^{x, L{t))) so that (124) is satisfied. It 
follows from (127) that for <2 > ti > 0: 

l/'fe) - f'{ti)\ <pE, (C, V(D^(,,),,,p - D^(,,),,,p)U^)^ (128) 

<pEi_i,\\'DLiti),x,p - 'DL{t2),x,p\\F- 

Combining the following observation 

||Pi(t,)(x) - Pi(i,)(x)|| < - < d\stG{Hh),L{t2)) ^h- h, 

with the following consequence of Lemma A. 2 

\\Pn,,y (x)dist(x, _ p^^^^^^ (x)dist(x, L(t2))^^-') II 



< 



2'"1I^L(tO-W-^ife)-Wr"'' whenl<p<2; 
b- l)II^L(ti)4x) - PLit,)A^)l whenp > 2. 



and the facts that 

||Pi(,,)4x)dist(x,i(ti))'''-''ll < 1 and ||Pl(,,)(x)|| < 1, 
we obtain that 

l|Di(t,).x.p-Di(,,),x,p||F = ||Pi(,,)(x)Pi(,,).(x)^dist(x,L(ti))(f^2) 
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^Lfe) (x)Pl(*,)x (x)^dist(x, L{t2)iP-^^\ 



\F 



+ ||Pi(,,)(x)||||Pi(,^)4x)dist(x,i(ti))(?'-2)-P^(,^)4x)dist(x,L(t2))(^-')|| 

<||Pi(,,)(x)-Pi(,,)(x)|| 

+ \\PL(t,)^ (x)dist(x, - Pi(t,)x (x)dist(x, II 

\t2-h) + {p-l){t2-ti), whenp>2; 
(t2 - ti) + (i2 - when l<p<2 



< 



^\p{t2-ti), whenp>2; 
~ [24-P niax((t2 -ti)^"\t2 -ii), when 1< p < 2. 

In view of (124), (128), (129) and our choice of /, we define: 

= , , , whenp>2; 

^' |2'i-P max(tP-\t), whenl<p<2. 

We note that the inverse function g^^ is 



when p > 2; 

^ min(2P-4 (2P"4 i)3rrT)^ when 1< p < 2. 



Applying Lemma A. 3 with / and g as above and xq ~ 0, we prove (101) as follows. 
We denote ci = \\Ef, (tr(CVDLj,x,pU^)) ||f- When p > 2, /'(xq) = pci and 

2 2 2 2 

Ci ^ • / pxax — 



P Jo P "^P 

=§ = fl|i^.(MCVD^.,.,,U^))|||. 

When 1 < p < 2, applying 

tr(CVDij,x,pU^) < ||C||j^||VDi.,,,pU^||;^ = ||C||^^||DL..x.p||i. 
<||Cl|^||||Pi.(x)|| ||P^.4x)^dist(x,L(ti))^^"'^ll < 1, 

we conclude that ci < land2P^'*pci < 2p-^ci < 1. Therefore, g-i(t) = {2P-^t)'^ 

p-4 1 

2p-i tp-1 for <t < pci and 

p-4 1 

/.2P-1 (pci)P-i 

Ci > PCi • 2^ (pci)^ - / 2'^~PxP~^dx = 2^ (pci)^ 

Jo 

-P^2^cf' = {p-l)p^2l^\\E^, (tr(CVDL.,,,,pU^)) ||f^. 
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